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NEISSERIAL ANTIGENS 

This invention relates to antigens from Neisseria bacteria. 
BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
5 are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); Ngonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
10 present in all pathogenic meningococci. 

Ngonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.81 7-842). The disease causes significant morbidity but limited mortality. 
15 Vaccination against Ngonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

20 N meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. N Engl J Med 337(14):970- 

25 976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 
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Based on the organism's capsular polysaccharide, 12 serogroups of N. meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 

5 the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 

10 immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H. influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 

15 vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of cc(2-8)-linked A r -acetyl neuraminic acid that is also present in mammalian tissue. This results in 

20 tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the Af-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 

25 candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
30 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
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porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala' Aldeen & Borriello (1996) The meningococcal transferrin-binding proteins 1 
5 and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
1 0 are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisserias 

THE INVENTION 

15 The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to N. meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
20 homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication of 
functional equivalence. Identity between the proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
affine gap search with parameters gap open penalty=12 and gap extension penalty^ 1 . 

25 The invention further provides proteins comprising fragments of the Neisserial amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 
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The proteins of the invention can, of course, be prepared by various means (eg. recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

5 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
10 sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
15 at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
20 complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself ere.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
25 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 
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According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
5 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
10 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of TV. meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
15 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

20 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

25 A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 
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A summary of standard techniques and procedures which may be employed in order to perform the 
invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

5 General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and 

10 ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (MJ. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & S.J. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.L Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 
Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 

15 Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

20 Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

Definitions 

25 A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 
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The term "heterologous" refers to two biological components that are not found together in nature. 
The components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
5 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

10 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

15 cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 

20 Smith-Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 

25 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
30 for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
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i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3 1 ) 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
5 transcription initiating region, which is usually placed proximal to the 5* end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 1 00 to 200 bp upstream of the TATA box. An upstream promoter element 
10 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
15 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 
promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

20 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000- fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

25 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J. 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

30 [Gorman et al. (1982b) Proc. Natl Acad. ScL 79:6777] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 47:521]. Additionally, some enhancers are regulatable and become active only 
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in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
5 recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
10 sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 
in mammalian cells. 

1 5 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Bimstiel et al. (1985) Cell 41:349; 
Proudfoot and Whitelaw (1988) 'Termination and 3* end processing of eukaryotic RNA. In 

20 Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
Sci. 74:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual], 

25 Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 

30 mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
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viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 25:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
antigen. Additional examples of mammalian replicons include those derived from bovine 
5 papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufinan et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 
(1986) Mol Cell Biol (5:1074]. 

10 The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei. 

15 Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells {eg. Hep G2), and a 
number of other cell lines. 

20 ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, 
and is operably linked to the control elements within that vector. Vector construction employs 
techniques which are known in the art. Generally, the components of the expression system include 
a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus 

25 genome, and a convenient restriction site for insertion of the heterologous gene or genes to be 
expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment 
in the transfer vector (this allows for the homologous recombination of the heterologous gene in to 
the baculovirus genome); and appropriate insect host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
30 wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
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are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
5 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
10 elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple 
genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are 
often maintained in a replicon, such as an extrachromosomal element {eg. plasmids) capable of stable 
maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing 
it to be maintained in a suitable host for cloning and amplification. 

15 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 7 7:31. 

20 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:177) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 

25 (5* to 3') transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5* end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 

30 regulated or constitutive. 
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Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
5 476; and the gene encoding the plO protein, Vlak et al., (1988), J. Gen. Virol. 69:165. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
75:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 

1 0 insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 3 J 5:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et aL, (1988), Molec. Cell. BioL 5:3129; 
human IL-2, Smith et al., (1985) Proc. Nat'lAcad. Sci. USA, 52:8404; mouse IL-3, (Miyajima et 

15 al., (1987) Gene 58:273- and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
20 containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
25 of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
30 vector and the genomic DNA of wild type baculovirus — usually by co-transfection. The promoter 
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and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol. Cell. Biol. (1983) 5:2156; and Luckow and Summers (1989)). For example, the insertion 
5 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays 4:91. The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

10 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

15 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 fim in size, are highly refractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 

20 wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

25 Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol. 56:153; Wright (1986) Nature 
327:718; Smith et al., (1983) Mol. Cell. Biol. 5:2156; and see generally, Fraser, et al. (1989) In 

30 Vitro Cell. Dev. Biol. 25:225). 
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Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
5 stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product 
gene is under inducible control, the host may be grown to high density, and expression induced. 
Alternatively, where expression is constitutive, the product will be continuously expressed into the 
medium and the nutrient medium must be continuously circulated, while removing the product of 
interest and augmenting depleted nutrients. The product may be purified by such techniques as 
10 chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, etc.; 
electrophoresis; density gradient centrifligation; solvent extraction, or the like. As appropriate, the 
product may be further purified, as required, so as to remove substantially any insect proteins which 
are also secreted in the medium or result from lysis of insect cells, so as to provide a product which 
is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

15 In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 
iii. Plant Systems 

20 There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 

25 Vaulcombe et al., Mol Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol. Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 33 A A (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 

30 gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
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References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl Acad. ScL 
84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
5 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 
1 0 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 
1 5 general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol Biol Reptr, 1 1(2): 165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
20 cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
25 although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 
5 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 
10 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
1 5 region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed andManiatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 

20 Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 

25 entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl. Acad. ScL USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
30 reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
5 other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigiia, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solarium, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
1 0 Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 

15 formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 

20 history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
25 embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
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iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3*) transcription of 
a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 
5 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 

10 thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 

15 coli) [Raibaud et aL (1984) Annu. Rev. Genet. 75:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 
Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose (lac) [Chang et al. (1977) Nature 795:1056], and maltose. Additional examples include 

20 promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et aL 
(1980) Nuc. Acids Res. 5:4057; Yelverton et aL (1981) Nucl. Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) promoter system 
[Weissmann (198 1) "The cloning of interferon and other mistakes." In Interferon 3 (ed. I. Gresser)], 
bacteriophage lambda PL [Shimatake et aL (1981) Nature 292:128] and T5 [US patent 4,689,406] 

25 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
30 promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Aniann et aL (1983) Gene 25:167; de Boer et aL (1983) Proc. Natl. Acad. Sci. £0:21]. 
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Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
5 polymerase/promoter system is an example of a coupled promoter system [Studier et al (1986) J. 
Mol Biol. 759:113; Tabor et al (1985) Proc Natl Acad. ScL 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0 267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
10 the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-11 nucleotides upstream of the initiation codon [Shine et al (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al (1979) 
15 "Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual], 

A DNA molecule may be expressed intracellular^. A promoter sequence may be directly linked 
20 with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
25 N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5* end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
30 [Nagai et al (1984) Nature 509:810]. Fusion proteins can also be made with sequences from the 
lacZ [Jia et al (1987) Gene 60:197], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al 
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(1989) J. Gen. Microbiol 135:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme {eg. ubiquitin specific processing-protease) to cleave the 
5 ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al. (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
1 0 a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the 
cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic 
space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably 
there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal 
peptide fragment and the foreign gene. 

15 DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al. (1984) EMBOJ. 3:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al (1985) Proc. Natl. Acad. Sci. 82:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 

20 can be used to secrete heterologous proteins from B. subtilis [Palva et al. (1982) Proc. Natl. Acad. 
ScL USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3* to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
25 encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the irp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
30 coding sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
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element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 
5 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
20 [Davies et al (1978) Annu. Rev. Microbiol. 32:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al. (1981) Nature 292:128; Amann et al. (1985) Gene 40:183; Studier et al 

30 (1986) J. Mol. Biol. 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
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Streptococcus cremoris [Powell et al (1988) Appl Environ. Microbiol 54:655]; Streptococcus 
lividans [Powell et al (19&S) Appl Environ. Microbiol 54:655], Streptomyces lividans [US patent 
4,745,056], 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
5 include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson et al (1989) FEMS Microbiol Lett. (50:273; Palva et al (1982) Proc. Natl Acad. Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 

10 Proc. Natl Acad. Sci. 55:856; Wang et al (1990) J. Bacteriol 772:949, Campylobacter], [Cohen 
et al (1973) Proc. Natl Acad. Sci. 59:2110; Dower et al (1988) Nucleic Acids Res. 76:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl -derived 
plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al. (1970) J. Mol Biol 53:159; Taketo 

15 (1988) Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett. 
44:173 Lactobacillus]; [Fiedler et al (1988) Anal Biochem 7 70:38, Pseudomonas]; [Augustin et 
al. (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al. (1980) J. Bacteriol. 
744:698; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al (1981) Infect. Immun. 

20 52:1295; Powell et al (1988) Appl Environ. Microbiol 54:655; Somkuti et al. (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3*) 

25 transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 
Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 

30 The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
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of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
5 include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucoses- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PHOS gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara et al. (1983) Proc. Natl. Acad. Sci. USA £0:1]. 

1 0 In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 

15 which consist of the regulatory sequences of either the ADH2, GAL4, GAL J 0, OR PHOS genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 

20 77:1078; Henikoff et al. (1981) Nature 283:835; Hollenberg et al. (1981) Curr. Topics Microbiol. 
Immunol. 96:119; Hollenberg et al. (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 
77:163; Panthier et al. (1980) Curr. Genet. 2:109;]. 

25 A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 
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Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5 ' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
5 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5' terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 
10 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WOS 8/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
15 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
20 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
25 acid residues) as well as truncated alpha- factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha- factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 
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Usually, transcription termination sequences recognized by yeast are regulatory regions located 3* 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
5 termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 

1 0 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 5:17-24], pCl/1 [Brake et al, 
(1984) Proc. Natl Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et al, (1982) J, Mol 
Biol 755:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 

1 5 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 10 to about 150. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al, supra. 

20 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al (1983) Methods in 

25 Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr-Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al (1983) Proc. Natl Acad. Sci. USA 80:6750], The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

30 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
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chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
5 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, H1S4, LEU2 y 
TRPJ 9 and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al. (1987) Microbiol 
10 Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
1 5 have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeasts:Candida albicans [Kurtz, et al. (1986) MoL 
Cell. Biol. 6:142], Candida maltosa [Kunze, et al. (1985) J. Basic Microbiol 25:141]. Hansenula 
polymoipha [Gleeson, et al. (1986) J. Gen. Microbiol 1 J2:3459; Roggenkamp et al (1986) Mol r 
Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al (1984) J. Bacteriol 755:1165], 
20 Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacteriol 154:737; Van den Berg et al 
(1990) Bio/Technology> 5:135], Pichia guillerimondii [Kunze et al (1985) J. Basic Microbiol. 
25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell. Biol 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 
75:1929; Ito et al (1983)7. Bacteriol 755:163], Schizosaccharomyces pombe [Beach and Nurse 
25 (1981) Nature 300:706], and Yarrowia lipolytica pavidow, et al. (1985) Curr. Genet. 70:380471 
Gaillardin, et al. (1985) Curr. Genet. 10:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 
Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
30 et al. (1986) Mol Cell Biol. (5:142; Kunze et al. (1985) J. Basic Microbiol 25:141; Candida]; 
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[Gleeson et al (1986) J. Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet. 
202:302; Hansenula]; [Das et al. (1984) J. Bacteriol. 158:1 165; De Louvencourt et al (1983) J. 
Bacteriol 154:1165; Van den Berg et al. (1990) Bio/Technology 8:135; Kluyveromyces]; [Cregg 
et al. (1985) Mol Cell Biol 5:3376; Kunze et al. (1985) J. Basic Microbiol 25:141; US Patent 
5 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al. (1978) Proc. Natl Acad. Sci. USA 75;1929; 
Ito et al (1983) J. Bacteriol. 753:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al. (1985) Curr. Genet. 70:39; Gaillardin et al (1985) Curr. 
Genet. 70:49; Yarrowia]. 

Antibodies 

10 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

15 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

20 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 (ig/injection 

25 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

30 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-28- 

recovered by centrifugation (eg. l,000g- for 10 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
5 above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 

10 the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT')- The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 

15 cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 ? 
and 125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 

20 are typically detected by their activity. For example, horseradish peroxidase is usually detected by its 
ability to convert 3,3\5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand molecule 
with high specificity, as for example in the case of an antigen and a monoclonal antibody specific 
therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, 

25 and the numerous receptor-Iigand couples known in the art. It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may 
serve in several different modes. For example, 125 I may serve as a radioactive label or as an 
electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect. For example, MAbs and avidin also require labels in the practice of 

30 this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled 
with l25 I, or with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be 
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readily apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
5 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 

10 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 

1 5 experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such 

20 as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. Suitable 
carriers may be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus 

25 particles. Such carriers are well known to those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
30 Pub. Co., N.J. 1991). 
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Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanoL Additionally, auxiliary substances, such as wetting or emulsifying agents, 
pH buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic 
compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 
5 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. Liposomes 
are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

10 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

15 treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
20 usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
25 known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated to 
a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
30 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 PCT/IB98/01665 

-31- 

such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
5 formulated into submicron particles using a microfluidizer such as Model 110Y microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
blocked polymer LI 21, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 

10 cell wall components from the group consisting of monophosphory lipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 
adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins {eg. 

15 IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons {eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 
act as immunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
20 threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglut^ 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions {eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutical^ acceptable carrier, and adjuvant) typically will contain diluents, such 
25 as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
30 enhanced adjuvant effect, as discussed above under pharmaceutical^ acceptable earners. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 

5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 

10 can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
15 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non -viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 

25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
30 also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
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picornaviius, poxvirus, or togavims viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector 
5 is employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for 
example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) polytropic retroviruses 
eg. MCF and MCF-MLV (see Kelly (1983) J. Virol 45:291), spumaviruses and lentiviruses. See RNA 
Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
10 example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
15 5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
20 producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells {eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 

Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
25 Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
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collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
5 WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
10 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller (1990) 
Human Gene Therapy 1 . 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:43 1, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 

15 employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 

20 Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 

25 which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 

30 terminal repeat (ie. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
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native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol. 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 

5 Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 
further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Tlierapy 7:463-470. 

10 Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5,139,941, 
and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
15 exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHS Viae described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

20 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 

25 WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1 995 ,W094/2 1792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 

30 08/679640). 
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DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

5 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973)7. Biol 
Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those described in Arnold (1990) 
J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC 
VR-1 1 1 and ATCC VR-201 0 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; 

10 Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,1 12 and US 
4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those described in 
Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza virus, for 
example ATCC VR-797 and recombinant influenza viruses made employing reverse genetics 
techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; 

15 Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichael 
(1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) J. Virol 66:2731; 
measles virus, for example ATCC VR-67 and VR-1 247 and those described in EP-0440219; Aura 
virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1 240; 

20 Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC 
VR-369 and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for 
example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1 244; Ndumu 
virus, for example ATCC VR-371; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; 

25 Tonate virus, for example ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for 
example ATCC VR-374; Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example 
ATCC VR-375; CWyong virus, Eastern encephalitis virus, for example ATCC VR-65 and ATCC 
VR-1242; Western encephalitis virus, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1 252; and coronavirus, for example ATCC VR-740 and those described in Hamre 

30 (1966) Proc Soc Exp Biol Med 1 2 1 : 1 90. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 PCT/IB98/01665 

-37- 

expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No. 08/240,030, filed May 9, 
5 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in WO92/11033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

10 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 

15 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by the 
20 beads. The method may be improved further by treatment of the beads to increase hydrophobicity and 
thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 

25 vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 
DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 

30 promoters. Further non- viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. USA 
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91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
5 activating transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600: 1 ; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
10 149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs 
in the individual to which it is administered. 

15 Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

20 Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneal^, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 

25 treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 
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Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
5 known in the art. 

Polynucleotide and polypeptide vharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

10 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 

] 5 other invasive organisms, such as the 1 7 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RJI. 

B. Hormones. Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

20 C.Polvalkvlenes, Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

25 D. Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
30 but will generally be around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
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use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Meth. EnzymoL 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
5 mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci. USA 
84:7413-7416); mRNA (Malone (1989) Proc, Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990) J. Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
10 Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 

15 Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. 
These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate 

20 ratios. Methods for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101:512-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
25 394:483; Wilson (1979) Cell 1 7:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl Acad. Sci. USA 76:145; Fraley (1980) J. Biol 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
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E.LiDODroteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, 
fragments, or fusions of these proteins can also be used. Also, modifications of naturally occurring 
5 lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the delivery of 
polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with 
the polynucleotide to be delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
10 identified. At least two of these contain several proteins, designated by Roman numerals, AI, All, 
AIV; CI, CII, CIII. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
15 HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151 :162; Chen (1986) J Biol Chem 
261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
20 phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
naturally occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
25 association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Meth. EnzymoL (supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
30 Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
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443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Technologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/1 4465. 

F.Polvcationic Agents 

5 Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
10 be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-hi stone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
15 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
20 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

25 Immunodiaznostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
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samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
5 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
10 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
1 5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in solution. 
Then, the two sequences will be placed in contact with one another under conditions that favor 
hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment of the liquid 
phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; 
20 use of compounds to increase the rate of association of sequences (dextran sulfate or polyethylene 
glycol); and the stringency of the washing conditions following hybridization. See Sambrook et al. 
[supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
25 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al at page 9.50. 

30 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
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total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to 1 p.g for a 
plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
5 can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 jig of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/jig. For a single-copy mammalian gene a conservative approach would start 
with 10 jig of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10 8 cpm/^g, resulting in an exposure time of -24 hours. 

10 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of these 

15 factors can be approximated by a single equation: 

Tm= 81 + 1 6.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/;?- 1 .5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal. Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
20 conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (/e 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
25 interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for 
30 a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
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and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 
wash conditions which are nonstringent. If non-specific bands or high background are observed 
5 after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
1 0 probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and anti sense strands). Though many different nucleotide sequences will 
1 5 encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
20 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
25 sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
30 temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
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complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

5 Probes may be produced by synthetic procedures, such as the tri ester method of Matteucci et al 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al. [Proc. Natl Acad. ScL USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
1 0 backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al. (1993) TIBTECH 11:384-386]. 

1 5 Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al. [Meth. Enzymol. 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
that does not hybridize to the sequence of the amplification target (or its complement) to aid with 

20 duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as Southern 
25 blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
30 support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-47- 

to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
5 ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 
in Western blots, the position of the main N. meningitidis immunoreactive band. TP indicates 
meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
10 shows GST control data; a circle (•) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPH1 analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al (1989) J. Immunol 143:3007; Roberts et al (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al (1992) Scand J Immunol suppl.l 1:9) and is available in the Protean package of DNASTAR, Inc. 
15 (1228 South Park Street, Madison, Wisconsin 53715 USA). 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N. meningitidis, along 
with their putative translation products, and also those of N. gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

20 The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
25 ^.gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 
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The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
'known function is widely used as a guide for the assignment of putative protein function to a new 
5 sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
1 0 following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www. genome.ou.edu/gono_blast.html. The FAST A algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

1 5 Dots within nucleotide sequences {eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 

20 experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
domains using an algorithm based on the statistical studies of Esposti et al [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 

25 Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 
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Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg, in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
(eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, SOmM EDTA, pH8). 
After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (SOmM 
NaCl, 1% Na-Sarkosyl, 50jag/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 
ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRl-Nhel, depending on the gene's own restriction pattern); the 3' primers included 
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a Xhol restriction site. This procedure was established in order to direct the cloning of each 
amplification product (corresponding to each ORF) into two different expression systems: pGEX-KG 
(using either BamM-Xhol or EcoRl-Xhol), and pET21b+ (using either Ndel-Xhol or Nhe\-Xho\). 
5 '-end primer tail: CGCGGATCCCATATG (BamHl-Ndel ) 

5 CGCGGATCCGCTAGC (BamHl-Nhel) 

CCG GAATTC TA GCTAGC (EcoRl-Nhel) 

3'-end primer tail: CCCGCTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
1 0 the same 3 ' Xhol primer was used as before: 

5'-end primer tail: GGAATT C C AT AT G G C C AT GG (Ndel) 

5'-end primer tail: CG GGATCC (BamHl) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terminus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
1 5 BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATCAGCTAGCCATATG (Nhel) 

3 '-end primer tail: CG GGATCC (BamHl) 
As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
20 nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 

T m = 4 (G+C)+ 2 (A+T) ( tail excluded) 

T ro = 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
25 50-55°C for the hybridising region alone. 

Table I (page 487) shows the forward and reverse primers used for each amplification. In certain 
cases, it will be noted that the sequence of the primer does not exactly match the sequence in the 
ORF. When initial amplifications were performed, the complete 5' and/or 3' sequence was not 
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known for some meningococcal ORFs, although the corresponding sequences had been identified 
in gonococcus. For amplification, the gonococcal sequences could thus be used as the basis for 
primer design, altered to take account of codon preference. In particular, the following codons were 
changed: ATA^ATT; TCG-*TCT; CAG-*CAA; AAG-+AAA; GAG->GAA; CGA-+CGC; 
5 CGG-+CGC; GGG-»GGC. Italicised nucleotides in Table I indicate such a change. It will be 
appreciated that, once the complete sequence has been identified, this approach is generally no 
longer necessary. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
10 addition of 0.3M Na- Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either lOOjal or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-10pmol/fil. 

C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
15 in the presence of 20-40^iM of each oligo, 400-800|aM dNTPs solution, lx PCR buffer (including 
1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of 10(il DMSO or 50jal 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
20 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Den atu ration 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 


30 seconds 


30-60 seconds 
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95°C 


65-70°C 


72°C 



The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

5 The amplified DNA was either loaded directly on a 1% agarose gel or first precipitated with ethanol 
and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30fil or 50jil of either water or lOmM Tris, pH 8.5. 

1 0 D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- Nde\IXl\o\ or NlieVXlwl for cloning into pET-21b+ and further expression of the protein 
as a C-terminus His-tag fusion 

15 - BamHI/XhoI or EcoRI/XlioI for cloning into pGEX-KG and further expression of the 

protein as N-terminus GST fusion. 

- For ORF 76, NheVBamHl for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/PstI, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
20 the protein as N-terminus His-tag fusion 

Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40jal final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
25 50\x\ of either water or lOmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1% agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOug plasmid was double-digested with 50 units of each restriction enzyme in 200ul reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50ul of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 260 of the sample, 
and adjusted to 50ng/ul. lul of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia) . 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both pET22b 
and pGEX-KG. In a final volume of 20ul, a molar ratio of 3:1 fragment/vector was ligated using 0.5ul 
of NEB T4 DNA ligase (400 units/ul), in the presence of the buffer supplied by the manufacturer. 
1 5 The reaction was incubated at room temperature for 3 hours. In some experiments, ligation was 
performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's instructions. 

In order to introduce the recombinant plasmid in a suitable strain, lOOj^l E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
minutes, then, after adding 800ul LB broth, again at 37°C for 20 minutes. The cells were then 
20 centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200\i\ 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + lOOug/ml 
ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
25 Miniprep Kit, following the manufacturer's instructions, to a final volume of 30ul. 5ui of each 
individual miniprep (approximately lg ) were digested with either NdeVXhol or BamHVXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 
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For the cloning of ORPs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRI-Pstl cloning sites or, for ORFs 1 15 
& 127, EcoRl-SaR or, for ORF 122, Satl-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
5 with 50^1/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product. \\x\ of each construct was used to transform 30|il of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 

10 above. In the case of the pGEX-His vector, the same E.coli strain (W31 10) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(100ng/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (lOO^ig/ml) in 
100ml flasks, making sure that the OD^ ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

15 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a micro fuge, 

20 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

25 diluted 1 :30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-3 7°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 

30 The supernatant was collected and mixed with 1 50fil Glutatione-Sepharose 4B resin (Pharmacia) 
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(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 280 of 0.02-0.06. The GST-fusion 

5 protein was eluted by addition of 700ul cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 280 was 0.1. 21ul of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 1 16.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 

1 0 be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500ul PBS pH 7.2]. 25ul lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 

15 Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0.1 M NaH, P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 POJ overnight 

20 at 4°C. The supematants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 1 13, 1 19 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

25 A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD 550 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 

30 the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
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buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40 W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

5 For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 2ml 
buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) and 
treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 minutes. 

Supematants were collected and mixed with 150|il NP-resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
1 0 for 30 minutes. The sample was centrifuged at 700^ for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
15 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700|j.l of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 280 was 0.1. 21 (il of each 
20 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20^g/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
25 14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1 .55 x OD 280 ) - (0.76 x OD 260 ) 
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L) His-fusion large-scale purificati n (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
5 on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20|ig of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifiiged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. 100nl bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
buffer (0.1% Tween-20 in PBS). 200^x1 of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 

30 water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
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three times with PBT. 200jal of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. lOOjal of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
5 37°C. Wells were washed three times with PBT buffer. lOOjil of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and 10f.il of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. IOOjj.1 H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

10 O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 

15 let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD 620 of 0.07. 1 OOjj.1 bacterial cells were added to each well of a Costar 96 well 
plate. lOOjil of diluted (1:200) sera (in blocking buffer) were added to each well and plates 

20 incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200(al/well of blocking buffer in each well. lOOjal of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted l : 100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200|il/well of blocking buffer. The supernatant was aspirated and cells 

25 resuspended in 200jal/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 
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P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 20mM 
Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted by 
sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed by 
5 centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by centrifiigation 
at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from the crude outer 
membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and incubated at room 
temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 minutes to remove 
aggregates, and the supernatant further ultracentrifiiged at 50000g for 75 minutes to pellet the outer 
10 membranes. The outer membranes were resuspended in lOmM Tris-HCl, pH8 and the protein 
concentration measured by the Bio-Rad Protein assay, using BSA as a standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5\xg) and total cell extracts (25\xg) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 1 50mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1 .44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 
at 4°C in saturation buffer (10% skimmed milk, 0.1% Triton XI 00 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton XI 00 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1 :2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton XI 00 in PBS and developed with 
the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
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tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1:20000 in Gey's buffer and stored at 25°C. 

50jal of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25jal of 
5 diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25p.l of the previously described bacterial suspension were added to each well. 
25jil of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22^1 of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
10 incubated for 1 hour at 37°C with rotation and then 22|o.l of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II (page 493) gives a summary of the cloning, expression and prurification results. 
Example 1 

15 The following partial DNA sequence was identified in N .meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA . AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A . GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . T AC AAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

20 201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

2 51 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

25 4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 

501 AGACCG. 

This corresponds to the amino acid sequence <SEQ ID 2; ORF37>: 

1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

30 101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD . . . 

Further work revealed the complete.nucleotide sequence <SEQ ID 3>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

35 101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

40 351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 
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This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 

l MK QTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

5 151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAG Y * 

Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 5>: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

in 151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 
15 51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DN DQRLKAG Y * 

The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75aa 
overlap with ORF37a: 

10 20 30 40 50 60 

20 orf37 pep MKOTVX MLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

TTTTi 1 M ! I I II I II: II I ! II M I I II MMMIIH MI:M :l 11:1 

orf37a MKO T VKW LAAA L I ALG LN Q AW A D DVSDFRENLQAAAQGN AAAQNNLGVM YAERRGVRQD 
10 20 30 40 50 60 

25 70 80 90 100 110 120 

orf 37 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

i I : I : : : I 
orf 37 a RALAQEWLGKACQNGYQDSCDN DQRLKAG YX 

70 80 90 

30 Further work identified the corresponding gene in N.gonorrhoeae <SEQ ID 7 >: 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

35 201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GG AG AC C AAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 

40 1 MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 

51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a lllaa 
overlap with ORF37ng: 

45 orf37 pep MKQTVXMLAAALIALGLNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 60 

Mill I I I I I I I II I I : II M II I It I I I I I 1 I I M I : I I I : I I : I I : I 
orf37ng MKQT VKW LAAAL I ALG LNQAVW AG DVSDFRENLQAAEQGN AAAQFNLGVM YENGQGVRQD 60 

orf 37 pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 120 
50 ' : : I I : I I I : : II I I I I I I I I I I I : I I M I I : I : I : I I 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQN GDQNSCDNDQ 120 

or f 37 . pep VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 1 68 
55 orf37ng RLKAGY 126 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 PCT/IB98/01665 

-62- 

The complete strain B sequence (ORF37-1) and ORF37ng show 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

orf 37-1 . pep MKQTVKWIAAALIALGLNRAVWADDVSDFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 
I I | I M I I I I I I I I I I I I : I I I I I I I I I I I I ! I II I I I I I II : I I I : II : I : I 1 I : I 
5 orf37ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 37- 1 . pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 
10 11 : : I I : I M : I : I I I I I I I I I I I ! I : I I I I I I I 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

70 80 90 

130 140 150 160 . 170 180 

15 or f 37 - 1 . pep VI YAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 

I I I i : I : I I I I 

orf37ng LALAQQWLGKAC 

100 

20 190 199 

orf37-l .pep QNGDQDGCDNDQRLKAGYX 
I I II I : : I II I I I I I 11 I I 
orf37ng QNGDQNSCDNDQRLKAGYX 
110 120 

25 Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
30 1 A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 

Example 2 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 9>: 

TTCGGCGA CAT CGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 

GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 

40 TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 

ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 

GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG G CG AC AC CAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 

TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
45 GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 



BNSDOCID: <WO 992457BA2J_> 



10 



WO 99/24578 PCT/IB98/01665 

-63- 

101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a hypothetical H.influenzae prot ein (Vbrd.haein; accession number p45029) 

SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

v-bd h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGWIGRVSAITLDE 

I : : I i II I I : I I : i : I I : : I I I i I I : I I 
vr m FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
N ' in 10 20 30 



80 90 100 110 120 130 

vrbd h KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 
. | | | : : | : : : : : : I : : : : : 1 I I I I I I I M I I : I 1 111:1:1=1 I 
N m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 
15 40 50 60 70 80 

140 150 160 

yrbd . h TSAMVLEDLIGQFL — YGSKKSDGNEKSESTEQ 
: I I I I I I : I M : I : : : : I : : II : : 
20 N m S S AMVLENLI GKFMT S FAEKNADGGNAEKAAEX 

90 100 110 120 

Homology with a predicted ORF from N gonorrhoeae 

SEQ ID 9 shows 99.2% identity over a 1 18aa overlap with a predicted ORF from N. gonorrhoeae: 

95 20 30 40 50 60 70 

vrbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYAD FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

y I 1 I I I 1 1 I f I I 1 I I I I I ! 1 I I 1 1 I I 1 I 1 1 I 

vj _ FGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 
N ' m 10 20 30 

30 

80 90 100 110 120 130 

vrbd KSYQARVRLDLDGKYQFSSDVSAOILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
I I I 1 I I I I I II I I I I M I 1 I I I I I 1 I I I I I I I I i I I I I M I M I t I I I I I 1 I I It I M I I 
N m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 
35 * * 40 50 60 70 80 90 

140 150 160 

yrbd VLEN LI GKFMT SFAEKNAEGGNAEKAAEX 

I I M 1 I I I I I I i I I I I I I : I I I I M 11 I I 
40 N.m VLENLIGKFMTS FAEKNADGGNAEKAAEX 

100 110 120 

The complete yrbd H. influenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
45 epitopes, could be a useful antigen for vaccines or diagnostics. 

Example 3 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 11>: 

1 . .ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

50 101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

201 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

251 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

301 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
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10 



351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



GCGCAACGCg 
TCGACCACTT 
AAAAAAGTAT 
GCCCCCTTTC 
ACGGAAAAGT 
ATCGTTTTTC 
CATCGGCACG 
ACGTCGCCGT 
AAAGCCGCCG 
GACCGTCTCG 
AAGCGGTCG . 



CTTTCGTGGG 
CAGCCTGTGC 
TAATCAAGGA 
ACAGGAAAAC 
CGTTGCCGAC 
TGGACGACCG 
ACGCTGCTGC 
CGCCGTCGGC 
CGCTCGGCTT 
CCTTCTGCAA 



ACGAAAAATT 
CTCGACATCA 
AGGGATTTCC 
GCAAACTCGC 
CTTGCCGCCG 
CGCACAAGGC 
TTGAAAACAG 
AACAACCGCA 
CGCCCTGCCC 
CAGTCGGACA 



CGCCTGCGAT 
AAATCCTACT 
GCACAGGGCG 
CGTCGTCGGT 
CACTCGGCCG 
AGCGTCAACG 
TTTATCGCCC 
TCCGCCGCCA 
GTACTGGTTC 
AGGCAGCGTC 



This corresponds to the amino acid sequence <SEQ ID 12; ORF3>: 



15 



l 

51 
101 
151 
201 
251 



. ILIYLI RKNL 
ERLTPFGKKL 
RRHEMKPGIT 
KKVLIKEGIS 
IVFLDDRAQG 
KAAALG FAL P 



GS PVFFFQER 
RAASXDELPE 
GWAQVNGRNA 
AQGEXTMPPF 
SVNGFSVIGT 
VLVHPDATVS 



PGKDGKPFKM 
LWNILKGEMS 
LSWDEKFACD 
TGKRKLAWG 
TLLLENSLSP 
PSATVGQGSV 



VKFRSMRDGL 
LVGPRPLLMQ 
VWYIDHFSLC 
AGGHGKWAD 
EQYDVAVAVG 
VMAKAV . . 



GTTTGGTATA 
GCTGACGGTT 
AACA.aCCAT 
GCGGGCGGAC 
GTACAGGGAA 
GCTTTTCCGT 
GAACAATACG 
AATCGCCGAA 
ATCCGGACGC 
GTTATGGCGA 



YSDGIPLPDG 
YLPLYDNFQN 
LDIKILLLTV 
LAAALGRYRE 
NNRIRRQIAE 



Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 



20 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAGTAAAT 
ACTGATTTTC 
AGAATCTAGG 
GGAAAACCTT 
TTCAGACGGC 
AAAAACTGCG 
TTAAAAGGCG 
TCTGCCGCTG 
GCATTACCGG 
GAAAAATTCG 
C G AC AT C AAA 
GGATTTCCGC 
AAACTCGCCG 
TGCCGCCGCA 
CACAAGGCAG 
GAAAACAGTT 
CAACCGCATC 
CCCTGCCCGT 
GTCGGACAAG 
CGTATTGAAA 
ACTGCCTGCT 
GGCAACACGC 
CCGCCAGCAG 
TCGTCGTACG 
AAGCCGCTGC 



TCTTCAAACG 
CTCTCGCCAG 
TTCGCCCGTC 
TTAAAATGGT 
ATTCCGCTGC 
TGCCGCCAGT 
AGATGAGCCT 
TACGACAACT 
CTGGGCGCAG 
CCTGCGATGT 
ATCCTACTGC 
ACAGGGCGAA 
TCGTCGGTGC 
CTCGGCCGGT 
CGTCAACGGC 
TATCGCCCGA 
CGCCGCCAAA 
TCTGGTTCAT 
GCAGCGTCGT 
GACGGCGTGA 
TAACGCTTTC 
ATATCGGCGA 
ATCCGTATCG 
CGACGTTTCA 
CGCGCAAAAA 



CCTGTTTGAC 
TATTTTTGAT 
TTCTTCTTTC 
CAAATTCCGT 
CCGACGGAGA 
TTGGACGAAC 
GGTCGGCCCC 
TCCAAAACCG 
GTCAACGGGC 
TTGGTATATC 
TGACGGTTAA 
GCCACCATGC 
GGGCGGACAC 
ACAGGGAAAT 
TTTTCCGTCA 
ACAATACGAC 
TCGCCGAAAA 
CCGGACGCGA 
TATGGCGAAA 
TTGTGAACAC 
GTCCACATCA 
AGAAAGCTGG 
GCAGCCGCGC 
GACGGCATGA 
CCCCGAGACC 



ATTGTTGCCT 
TTTGATATAC 
AGGAACGCCC 
TCCATGCGCG 
ACGCCTGACA 
TGCCTGAATT 
CGCCCGCTGC 
CCGCCACGAA 
GCAACGCGCT 
GACCACTTCA 
AAAAGT AT T A 
CCCCTTTCAC 
GGAAAAGTCG 
CGTTTTTCTG 
TCGGCACGAC 
GTCGCCGTCG 
AGCCGCCGCG 
CCGTCTCGCC 
GCCGTCGTAC 
TGCCGCCACC 
GCCCAGGCGC 
ATAGGCACGG 
AACCATTGGA 
CCGTCGCGGG 
TCGACAGCAT 



CCGCCTCGGG 
CTCATCCGCA 
CGGAAAGGAC 
ACGCGCTTGA 
CCGTTCGGCA 
ATGGAATATC 
TGATGCAATA 
ATGAAACCCG 
TTCGTGGGAC 
GCCTGTGCCT 
ATCAAGGAAG 
AGGAAAACGC 
TTGCCGACCT 
GACGACCGCG 
GCTGCTGCTT 
CCGTCGGCAA 
CTCGGCTTCG 
TTCTGCAACA 
AG GCAGGC AG 
GTCGATCACG 
GCACCTGTCG 
GCGCGTGCAG 
GCGGGCGCAG 
CAATCCGGCA 
AA 



45 This corresponds to the amino acid sequence <SEQ ED 14; ORF3-l>: 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MSKFFKRLFD 
GKPFKMVKFR 
LKGEMSLVGP 
EKFACDVWYI 
KLAWGAGGH 
ENSLSPEQYD 
VGQGSWMAK 
GNTHIGEESW 
KPLPRKNPET 



IVASASGLIF LSPVFLILIY 



SMRDALDSDG 
RPLLMQYLPL 
DHFSLCLDIK 



IPLPDGERLT 
YDNFQNRRHE 
ILLLTVKKVL 



GKWADLAAA 
VAVAVGNNRI 
AWQAGSVLK 
IGTGACSRQQ 
STA* 



LGRYREIVFL 
RRQIAEKAAA 
DGVIVNTAAT 
IRIGSRATIG 



LIRKNLGSPV 
PFGKKLRAAS 
MKPGITGWAQ 
_IKEGISAQGE 
DDRAQGSVNG 
LGFALPVLVH 
VDHDCLLNAF 
AGAVWRDVS 



FFFQERPGKD 
LDELPELWNI 
VNGRNALSWD 
ATMPPFTGKR 
FSVIGTTLLL 
PDATVSPSAT 
VHISPGAHLS 
DGMTVAGNPA 



55 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 

10 20 30 
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ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
1 i M | I I I I I I ! I I M I I E I I I I I I II I I I 1! I I 
MSKFFKRLFDIVAS ASGLIFLSPVFLILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

40 50 60 70 80 90 

SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 

||:|:| Mil I I I I I I I M I I I I I I I I I I I I I I I I I : I I I : I I I I 1 I I M I I M f I I 
SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

100 110 120 130 140 150 

YnNmMRRHF.MKPGITGWAOVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
| | I t I I I I I I I I SI I I I I I I I I I I II I I I I I : I I I I : I M ! i I I I I I I I I I I I II I I II I 
YnNFQNRRHF.MKPGITGWAOVNGRNALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 
130 140 150 160 170 180 

160 170 180 190 200 210 

I KEG I S AQGEXTMP PFTGKRKLAWGAGGHGKWADLAAALGRYRE I VFLDDRAQGS VNG 
Tl I I I i II I I I I M I I l I II I M I M H I I I I I I : I II I I I I I II II I I I : I I I II I 
IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I | | M I I I M I I I I I 1 I : I : I M I I I I I I I I i I I II I I I M I I I M I : I I! : I I I I I I I 
FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
250 260 270 280 290 300 

280 

VGQGSWMAKAV 
I M I : I I i II I I 

VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
310 320 330 340 350 360 

The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

201 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

4 01 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

4 51 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGT AT T A ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

7 01 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

7 51 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

' 951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 

1 MSKFFKRLFD 1VAS ASGLIF LSPVFLILIY LI RKNLG5PV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDN FQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 



orf 3 .pep 
orf 3a 

orf 3 . pep 
orf 3a 

orf 3 .pep 
orf 3a 

orf 3. pep 

orf 3a 

orf 3 . pep 
orf 3a 

orf 3 . pep 
orf 3a 
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301 VGQGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 
4 01 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



ORF3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 

10 20 30 1 40 50 60 

orf 3a . pep MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
I | || M I I I I I II I I I II I I I I I I I I I II I I I I I I I I II I I I I I 1 I I I I It I I II M I I I 
orf 3-1 MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 ' 110 120 

orf 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
11:11111111 I I I I I I II I I I I I II M I I I I I I II II : I I I : M I I I M I II I I I I I I 
orf 3-1 SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 
I I M I M I II II I I I M I I I I I I I II M I I I : I I I I : I I II I I I I I M M I I I I I I I I I I 
orf 3-1 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 3a . peo IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
I M I 11 II I I I I I I I I M I I II I I I I I M I I I I I I : I I I I I I I I I I II I I 1 : I I I I I I 
orf 3-1 IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 3a . pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I I I I M I I I I I I I M I I : I : I I 1 M I 11 I I I I I I I I II I I I I I I I I I : II I : I I I II I I 
orf 3-1 FSV IGTTLLLEN S LS PEQYDVAVAVGNNRIRRQIAEKAAALG FAL PVLVH PDATVS PSAT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 3a . pep VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
I I I j : I ! I I I I I I I I I I I I I I I I M I I I I I 1 I I I I I : I I I 1 I I I I I 1 I M II : I I I I I I 
orf 3-1 VGQGSVVMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

orf 3a . pep IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLAGKNTETLRSX 

I I I I I I I I I I I I I I I M I I I I I II I M I I M I I I I I M I I I II II II 
orf 3-1 IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

370 380 390 400 410 

Homology with hypothetical protein encoded by yvfc gene (accession Z71928") of B. subtilis 
ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

ORF3 3 IYLIRKNLGSPVFFFQERPGKDGKPFKMVKFRSMRDGLYSDGIPLPDGERLTPFGKKLRA 62 

I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
yvfc 27 IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 86 

0RF3 63 ASXDELPELWNILKGEMSLVGPRPLLMQYLPLYDNFQNRRHEMKPGITGWAQVNGRNALS 122 

S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
yvfc 8 7 LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAIS 14 6 

ORF3 123 WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 

W++KF DVWY+D++S LD EGI T FTG 

yvfc 147 WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 196 
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Homology with a predicted ORF from N gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 



10 



15 



20 



25 



gonorrhoeae: 

orf3 
orf 3ng 
orf3 
orf 3ng 
orf3 
orf 3ng 
orf3 
orf 3ng 
orf3 
orf 3ng 
orf 3 



ILIYLI RKNLGS PVFFFQERPGKDGKP FKMVKFR 3 4 

: I I I I I I i I I I I M I : : I ! I I I I M I II 

MSKAVKRLFDIIAS ASGLIVLSPVFLVLIYLI RKNKGSPVrFIRERPGKDGKPFKMVKFR 60 

SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGSMSLVGPRPLLMQYLPL 94 

| M l : | | | | | | | | | : | | I I 1111111:1 11 I I I I I I : I I 1 II I I I I II I II I I I I I I 

SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 

| : : t I I I I I M I I I i M I I 1 I I I I ! I I I I I I M : I M I I 1:11: II : I If : I I I I I I I 

YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

I KEG I S AQGEXTMP P FTGKRKLAWGAGGHGKWADLAAALGRYRE IV FLDDRAQG S VNG 214 
j | | | | | | | | | | | | | I : I : 1 1 I ! I : I I I I I I I I M : I I I I I I I I II M I II : M I I I I 

IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAT^ALGTYGEIVFLDDRTQGSVNG 24 0 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

I | M I I M I I M 1 I 1 I I : I : : M I I I I I I I I I I : I : i I I I I i I II I : M II I I I I I I 

FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

VGQGSWMAKAV 286 
: M 1 ! I II I I I I 

IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 



orf3ng 

The complete length ORF3ng nucleotide sequence <SEQ ID \1> is: 



30 



35 



40 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAGTAAAG 
GCTGATTGTC 
AAAACTTAGG 
ggaaaacCTT 
TTCAGACGGC 
AAAAATTACG 
CTCAAAGGCG 
TCTGCCGCTT 
GCATTACCGG 
GAAAAGTTCT 
GGATATGAAA 
GCATTTCGGC 
AAACTCGCCG 
TGCCGCCGCA 
CCCAAGGCAG 
GAAAACAGTT 
CAACCGCATC 
AACTGCCCGT 
ATCGGACAAG 
CGTATTGAAA 
ACTGCCTGCT 
GGCAACACGC 
CCGCCAGCAG 
TTATCGTATG 
AAGCCCCTTA 



CCGTCAAACG 
CTGTCGCCCG 
TTCGCCCGTC 
TTAAAATGGT 
ATTCCGCTGC 
CGCCACCAGT 
AGATGAGCCT 
TACAACAAAT 
CTGGGCGCAG 
CCTGCGATGT 
ATCCTGTTTC 
GCAAGGGGAA 
TTATCGGCGC 
CTCGGCACAT 
CGTCAACGGC 
TATCGCCCGA 
CGCCGCCAAA 
TCTGATTCAT 
GCAGCGTCGT 
GACGGCGTGA 
TGACGCTTTC 
GTATCGGCGA 
ACAACCGTCG 
CGACATCCCG 
CGGGCAAAAA 



CCTGTTCGAC 
TGTTTTTGGT 
TTCTTCattC 
CAAATTCCGT 
CCGATAGCGA 
TTGGACGAAC 
GGTCGGCCCC 
TTCAAAACCG 
GTCAACGGGC 
TTGGTACACC 
TGACAGTCAA 
GCCACCATGC 
GGGCGGACAC 
ACGGCGAAAT 
TTCCCCGTCA 
ACAATTCGAC 
TCACCGAAAA 
CCCGACGCGA 
AATGGCGAAA 
TTGTGAACAC 
GtccaCATCA 
AGAAAGCCGG 
GCAGCGGGGT 
GACGGCATGA 
CCCCAAGACC 



ATCATCGCAT 
TTTAATATAC 
GGGAACGCCc 
TCCAtgcgcg 
ACGCCTGACC 
TTCCTGAATT 
CGCCCGCTTT 
CCGCCACGAA 
GCAACGCGCT 
GACAATTTCA 
AAAAGTCTTG 
CCCCTTTCGC 
GGCAAAGTCG 
CGTTTTTCTG 
TCGGCACGAC 
ATCACCGTCG 
CGCCGCCGCG 
CCGTCTCGCC 
GCCGTCGTAC 
TGCCGCCACC 
GCCCGGGCGC 
ATAGGCACGG 
TACCgccgGT 
CCGTCGCGGG 
GGGACGGCAT 



CCGCATCGGG 
CTCATCCGCA 
cgGAAAGGAc 
acgcgcttGA 
GATTTCGGCA 
ATGGAATGTC 
TGATGCAGTA 
ATGAAACCGG 
TTCGTGGGAC 
GCTTTTGGCT 
ATTAAAGAAG 
GGGGAATCGC 
TTGCCGAGCT 
GACGACCGCA 
GCTGCTGCTT 
CCGTCGGCAA 
CTCGGCTTCA 
TTCTGCAATA 
AGGCCGGCAG 
GTCGATCACG 
GCACCTGTCG 
GCGCGTGCAG 
GCAGGGgcGG 
CAACCCGGCA 
AA 



This encodes a protein having amino acid sequence <SEQ ID 18>: 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MSKAVKRLFD 
GKP FKMVKFR 
LKGEMSLVGP 
EKFSCDVWYT 
KLAVIGAGGH 
ENSLSPEQFD 
IGQGSWMAK 
GNTRIGEESR 
KPLTGKNPKT 



I IASASGLIV LSPVFLVLIY 



SMRDALDSDG 
RPLLMQYLPL 
DNFSFWLDMK 
GKWAELAAA 
ITVAVGNNRI 
AWQAGSVLK 
IGTGACSRQQ 
GTA* 



IPLPDSERLT 
YNKFQNRRHE 
ILFLTVKKVL 
LGTYGEIVFL 
RRQITENAAA 
DGVIVNTAAT 
TTVGSGVTAG 



LIRKNLGSPV 
DFGKKLRATS 
MKPGITGWAQ 
IKEGISAQGE 
DDRTQGSVNG 
LGFKLPVLIH 
VDHDCLLDAF 
AGAVIVCDIP 



FFIRERPGKD 
LDELPELWNV 
VNGRNALSWD 
ATMPPFAGNR 
FPVIGTTLLL 
PDATVSPSAI 
VHISPGAHLS 
DGMTVAGNPA 
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This protein shows 86.9% identity in 413 aa overlap with ORF3-1: 



10 20 30 40 50 60 

orf 3- 1 peD MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
|jj I t 1 I I I : I I I I I I I II II I I : (I I I I I I M I I I I It : : I I I I I I II I I I I I I I I 
5 orf3ng MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 3-1 pep SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
10 ' ' I I I I I I I M I I I II I : I I I I I I I I I M : I I I I I I I I ! I : I I I M I I I I I M I I I I I I I I 

orf3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 , 160 170 180 

15 orf 3-1 . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 

I : : || | ! | | II I I I I I I I II I I I i I II 1 I I I I I : I II I I I : M : I I : II I : II I I I I I 
or f3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 

130 140 150 160 170 180 

20 190 200 210 220 230 240 

orf 3-1 . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
I I I I M I I I II I 1 I I I : I : I I I I I : M ! I 11 I I II :! I I I I I I I I I I M I I : I I M M 
orf3ng IKEGISAQGEATMPPFAGNRKIAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 

190 200 210 220 230 240 

25 

250 260 270 280 290 300 

or f 3- 1 . peo FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

I I I I I I I I I M I M I II : I : : I I I I I I I I M M : I : I I I I M I I I I : I i I I I I I I I I 
orf3nc FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQ1TENAAALGFKLPVLIHPDATVSPSAI 
30 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 3-1 . oep VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 
: 1 I 1 I | I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I : M I M I I I 1 I II I II : I I I I I 
35 orf3ng IGQGS WMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHI S PGAHLSGNTRIGEESR 

310 320 330 340 350 360 

370 380 390 400 410 

orf 3-1 . peD IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 
40 * I I I I II I I I I : I !: I I I M I : I I : I I I I I I I II I I I I I I I : I : I I I 

orf 3ng IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGN PAKPLTGKNPKTGTAX 

370 380 390 400 410 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl I PIDI e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
45 >gi 1 1945702 1 gnl I PIDI e313004 (Z94043) hypothetical protein [Bacillus subtilis] 

>gi|2635938lgnl!PID|e!186113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis ] Length = 202 
Score = 235 bits (594), Expect = 3e-61 
Identities - 114/195 (58%), Positives = 142/195 (72%) 



50 



Query: 5 VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFI RERPGKDGKPFKMVKFRSMRD 64 

+KRLFD+ A+ h S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 
Sbjct: 3 LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 62 



55 Query: 65 ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 124 

DS G LPD . RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 
Sbjct : 63 ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 122 

Query: 125 QNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVLIKEG 184 
60 Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 

Sbjct: 123 QARRHEVKPG I TGWAQINGRNAI SWEKKFELDVWYVDNWS FFLDLKI LCLT VRKVLVSEG 182 

Query: 185 I S AQGEATM P P FAGN 199 
I T F G+ 

65 Sbjct: 183 IQQTNHVTAERFTGS 197 
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The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N.gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 Example 4 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 19>: 

1 . . AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGG'CTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

JO 151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC G AAAAAGT C C TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

15 4 01 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ID 20; ORF5>: 

1 . NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKS PYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS .... XR RFCTV* 

20 Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

95 2 01 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGG CGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

4 01 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

30 4 51 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAAT CGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA AT AT C CAT G C CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA AT CGAAG AC A TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCG AAGAAGC CGACACCATT CGGCCTGGTC 

35 7 01 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTT AT CGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 

40 1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 

45 251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ED 23 >: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

50 151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

2 51 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 
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401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

5 601 GAACGCTGGC GCATCCACGC GGCTACCGAA AT CG AAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

7 01 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCG AAA AAGTCNTTAT 

7 51 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

10 8 51 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 

1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
15 101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 
2 51 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

20 The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

10 20 30 

orf 5 . pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I I I I I I I I I M I I I I I I I I I I I I I I I I I : I 
25 orf 5a FHLKSILRPAVFVPEGKSLTALLKE FREQRN HMAI V I DE Y GGT SG L VT FED 1 1 EQ I VGD I 

130 140 150 160 170 180 

40 50 60 70 80 90 

orf 5 . pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
30 I I I I II I : I I I I 1 I I I I : : I II I I II M I I I I : M I I I I I I I I I I I III : I I I 

orf 5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 

100 110 120 130 

35 orf 5 . pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

I I I I I I I I I I I 1:1 I I I I I I I I I I I I I I I 
orf 5a RARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 

40 10 20 30 40 50 60 

orf 5a . pep MDGAQPKTNFXXRL I ARLAREPDSAEDVLT LLRQAHEQEV FDADTLLRLE KVLDFSDLEV 
I II I I II II I M I I I I I I I I I I I I I I I :! I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf 5-1 MDGAQPKTNF FERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

45 

70 80 90 100 110 120 

orf 5a . oep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
I I M I I I) I 1 I II It I I I II M I 11 t I 1 I I I M I I I I I I I I I I 1 I I I I I I II I M I I I I I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
50 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5a. pep EQ FHLKSILRPAVFVPEGKSLTALLKE FREQRN HMAI VI DEYGGTSGLVT FEDIIEQIVG 
M I I M I I I If II I I I M 1 I M I I I I M M I I I I I I I I I I II I II I I I I I I II I I I II I I 
55 orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKE FREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 5a . pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 
60 : I I I I I I I I : I I M II I i I : I I I I I E I M I I II M : i I I I I I I I I I I I II III : II 

orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINT FFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 

250 260 270 280 290 300 
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orf5a pep pararrksxyrrxaxhxrxrxqpppayadgdprevssavsvqfrmtvrafsvsirpirxt 
I I I I j I I Ml I I t: 1 M I I I I I I 1 I I I I I I : II I : I I I M I II I I I I I I I I i I 
orf5-l SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 
240 250 260 270 280 290 

5 Further work identified the a partial DNA sequence in N.gonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERL I ARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

10 151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 

Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

15 51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

201 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

20 301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

4 01 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTCACC TTTGAAGACA T CATC GAG CA AATCGTCGGT GACATCGAAG 

25 551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 

601 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 

651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 

7 01 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 

7 51 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 

30 801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 

851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 

901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

35 51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 

40 301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORF5ng): 

orf5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 30 

I I I 1 I I I I I I I II I I I I I! M I I I I I II : I 
45 orf5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

I | | | | 1 | : | || : I I : I I : : II I M I I I I I I I I : I I I I I I : I I I I I I I III : I I I 
orfSng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

50 

or f 5 RARRKS PYRRFAVHRRTRRQPPPAYADGDPREVSX RRFCTV 131 

II I | | M I M I M I I I II I II It : I I I I I I I I i MINI 

or f 5ng RARRKS PYRRFAVHRRPRRQPPPAHADGD PRE VSRACPHRRFCTV 2 8 7 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
55 304 aa overlap: 

10 20 30 40 50 60 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLE KVLDFAELEV 
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orf5-l 



10 



15 



20 



25 



orf 5ng-l .pep 
orf5-l 



orf 5ng-l .pep 
orf5-l 



orf 5ng~l .pep 
orf5-l 



orf 5ng-l . pep 



orf5-l 



. I M I I I II I I I I I I M I I M I I I M I I I I I I I I M I I I I I I I I I M I I I I I I I I :: I I I 
MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

M I M I I I I I M II I I II I I I I I I I I I I M I I I I I I I 1 I i I I I I I I I I II I I I I I I I I I I 
RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
70 80 90 100 110 120 

130 140 150 160 170 180 

EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
I I I I I M : I I I I II I I I t I I I I I I II 1 I I M I I I I I I I I I I I I I II I I I I M 1 I I I I I I I 
EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

130 140 150 160 170 180 

190 200 210 220 230 240 

DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: I I I I I I M : I I I : I I : I I : I I I I I I I I I I I I I I I : I I I I I I : I I I I I I II ill : I I 
EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 

250 260 270 280 290 300 

PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 
I I I I I I I I I I I I II I I I I I M I I I : I I I I I I I I I I I I 1 I I I I I I I I : I I I I I I I 

SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVSIRP 

240 250 260 270 280 290 



30 



orf 5ng-l . pep 
orf5-l 



IRQTX 
Mill 
IRQTX 
300 



35 



40 



45 



50 



55 



60 



Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlvC (accession U32716) of H. influenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

ORF5 2 
TlyC 

ORF5 62 
TlyC 

ORF5ng-l also shows significant homology with TlyC: 



HMAIVI DEYGGTSGLVT FEDI IEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 
HMAIV+DE+G SGLVT EDI +EQIVG+IEDEFDE++ AD I +S T+ + A T+I + D 
166 HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 224 

INTFFGTEYSIEEADTI 7 8 
N F T++ EE DTI 
225 FNAQFNTDFDDEEVDTI 241 



SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 



orf 5ng-l .pep 



10 20 30 40 50 

MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 
I i I : I : : I : : I : I :::::: I :::::::: I : I : I 
tlyc_haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 

10 20 30 40 50 60 



60 70 80 90 100 109 

orf 5ng-l . pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE — DKDEVLGILH 
I : : : I I I : I I I II I I : : :::::::: : I : : I I I I I I I I : : I : I : : : I I I I 

tlyc_haein VMEIAELRVRDIMIPRSQI I FIEDQQDLNTCLNTI IESAHSRFPVIADADDRDNI VGILH 

90 100 110 120 



70 



80 



110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 
MINI:: : I I I : I : I I I : I : I I I : I : : I I : I I : I I I I I \ : II : I : : I I I 
tlyc_haein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 
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130 140 150 160 170 180 

170 180 190 200 210 220 

orf5nq-l pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 

I | : I I I : I II I I I I I I I I M : I II I : : : I : s : : I I : I s I : I I I : I : : = M : I 
tlvc haein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 

* - 190 200 210 220 230 

230 240 250 260 270 280 

orf 5ng-l . pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 

tlvc haein tIGgLmQTFGYLPKiUeEIILKN^^ 

y - o^n o^n 9fi0 270 280 290 



240 250 260 270 280 

15 Homology with a hypothetical secreted pro tein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp|P77 3 92|YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>qi 1 1778577 (U82598) similar to H. Influenzae [Escherichia coli) >gi 1 1786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
20 approx. 440 aa protein YTFL_HAEIN SW : P44717 [Escherichia coli) Length «= 292 

Score - 212 bits (533), Expect = 3e-54 , ,„„ 

Identities = 112/230 (48%), Positives - 149/230 (64%), Gaps = 3/230 (1%) 

25 Query 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
Sbjct: 10 DTISNKKGFFSLLLSQLFHGEPKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query 61 RDAMITRSRMNVLKENDS IERITAYVI DTAHSRFPVI GEDKDEVLG I LHAKDLLKYM- FN 119 
3Q ^ J RD MI rs+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 

sbjct: 70 RDIMIPRSQMITLKRNQTLDECLDVIIESAHSRFPVISEDKDHIEGILMAKDLLPFMRSD 129 

Query 120 PEQFHLKSILRPAVFVPEGKSLTALLKE FREQRNHMAIVIDEYGGTSGLVTFEDIIEQIV 179 
E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
35 Sbjct: 130 AEAFSMDKVLRQAWVPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

Query 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

40 Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
K influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
N. meningitidis and N .gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
45 products of protein expression and purification were analyzed by SDS-PAGE. Figure 2 A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 5 

50 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 29>: 

1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 
51 GCGTTTTTCG CATATGAGGA AAGT CAT CG A CGCAACGCCC GACATCGGAC 
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101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 
151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG AC AG CT AC G A 
201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAcjG 
251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 
5 301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 
401 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 
451 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGhCC^GCGC 
501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 
10 551 GATTGCGCTG CCC. . 

This corresponds to the amino acid sequence <SEQ ED 30; ORF7>: 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

15 151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP.. 



Further sequence analysis revealed the complete DNA sequence <SEQ ID 3 1 



>: 



1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

20 151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

2 51 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CG G AC AC G AC ACCAAAGGCT 

25 4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAAT CG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

30 651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

7 01 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

7 51 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

8 01 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 
851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

35 901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

i MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

40 101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

2 01 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

2 51 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

45 Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded bv vces gene (accession P44270) of H. influenzae 
ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+" G+ V+ IEG F RK ++ P + K SNE++ A ++ + 

50 yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 

ORF7 5 6 NPEGQFFPDSYEI DAGGSDLQI YQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLV 115 

N EG +PD+Y +DL++ + +■ + M++ LN+AW R + LP NPYEMLI+A +V 

yceq 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

55 

ORF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSV I YGMGAAYKGK I RKADLRRDTP YNTYT 175 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G • IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 
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0RF7 17 6 RGGLPPT PIALP 167 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 

5 i MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

\0 251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of N. 

1 5 meningitidis: 

10 20 30 

orf7 pep MRGGRPDSVTVQI IEGSRFSHMRKVI DAT P 

I I I I I 1 I ! I I I M I I II I I I I M I I I I II I 
orf 7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKVI DATP 

20 70 80 90 100 110 120 

40 50 60 70 80 90 

orf7 pep DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 

II | j | II M I I I I I I I I I I I I M I I I I I I I I I ! I I I I M I I I I I : I I I I I I I I I I I I I 
?5 orf7a DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 

130 140 150 160 170 180 

100 110 120 130 140 150 

or ^7 pep EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 
30 " ~ II I I M I 11 I I I II I I I I II I I I : I I II II I I I I I I 1 M I I I I I I I I I i I 1 II I I 

or f7a EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 

160 170 180 

35 orf 7 . pep GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 

I M I M I I I I I M II I I I I I I I I I I I I I I I I I I I I I I 
orf 7a GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 
250 260 270 280 290 300 

40 orf7a DGTGLSQFSHDLTEHNAAVRKYI LKKX 

310 320 330 

The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

45 101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

2 51 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

50 351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

55 601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

60 851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 
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This is pred. :red to encode a protein having amino acid sequence <SEQ DD 34>: 

1 MLRKLLKWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYICNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 



A leader peptide is underlined. 



10 ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



10 20 30 40 50 60 

orf 7a . pep MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 

I I I M I I It I I I I I I M I I I I I I M M II I M M II M I I I II I I I I I I I I I I I M M I I 
orf 7-1 MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 

10 20 • 30 40 50 60 

70 80 90 100 110 120 

orf 7a . pep H VL.TAAAYVLGVHNRLHT GT YRL PSEVS AW D I LQKMRGGRP D S VT VQ 1 1 EG S RFSHMRKV 

II I I I I I I I I I II I II I I M I I I I I I I I I I I I U I I I I I I I M I II I I I I I I M II M I I 
orf 7-1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7a . pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 
, II I II I I II I I I I I I M I I I II I I! I I I I I I I M I I I I I I I I I I I I I i I : I I I I I I 1 I 
orf 7-1 IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAW 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7a .pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 
M I I I I I I I I I M M II M 1 I I M I II I I : I I II I M II I I I I I I I I I I I I I I I I I I I I I 
orf 7-1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 7a . pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
I I M I I I I I 11 I I I I I I I I I I t II I I M I I I I I I II I I I I I I I I M I I I I I I I I II I I I I 
orf 7-1 PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 

310 320 330 

orf 7a . pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
i I I I I I I I I i I I I I I I I I I I I I I I ! I I I I I 1 I 
orf 7-1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 

310 320 330 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from N. 
gonorrhoeae: 



50 



55 



60 



orf7 

orf7ng 

orf7 

orf 7ng 

orf7 

orf 7ng 

orf7 



MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 
I I I I I I I I I I 1 I M I I 1 I I I ! M I I I I I I 11 1 I I I I I I I I I I I I I I I I I I I I I I 1 I M I I 

MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 120 
I I I I I I M I I I I I I I I I I I M I I I I I I I I I I M : I I I I f I I I I I I II I I I I 1:11111 

FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

HEAXXDHVAS VFVNRLKI GMRLQTXXSV I YGMGAAYKGKI RKADLRRDT PYNTYTRGGLP 180 
III I I I I I II I I I I I 1 I I I I M I I I I II M M I I 1 I I I I I I I I I I I I I I I I I It I 

HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 



PTPIALP 



187 
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orf7ng PTRIALPGKAAMDAAAKPSGEKYLYFVSKMDGTGLSQFSHDLTE^^ 236 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 

5 1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

10 Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 

1 taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGT CGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

15 201 GCCGGATTCC GTTACCGTGC AG AT TAT C G A AGGTTCGCGT TTTTCGCATA 

2 51 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACTGAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

4 01 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

20 4 51 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT AT AAAAAC C C 

5C p TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 

55 1 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

60i GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

25 701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

75* Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

801 rttcgcgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AGCCATGATT 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT AT ATT T T G AA AAAATAA 

This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng-l>: 

30 1 YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV I YGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

35 251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

ORF7ng-l and ORF7-1 show 98.0% identity in 298 aa overlap: 

10 20 30 40 50 60 

orf7-l pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

I I I I I I 1 M I I I I I I I ! I I I I M I I I II M 
40 orf7nq-l YRIKIAKNQGI SSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

orf7-l pep TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 
45 | | | ! M I I I I I I I M I I I I M I II I I I I I I I I I M M I I II I II II I! I I I I I I I I I M I 

orf7nq-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

40 50 60 70 80 90 

130 140 150 160 170 180 

50 orf7-l pep T PD I GHDTKGWSNEKLMAEVAPDAFSGN PEGQFFPDS YE I DAGGSDLQI YQTAYKAMQRR 

| I I i I I | | I I I I I I I I t I I I M I I I I I I M I I I M I I I I I I M I I I I I I I I I I I I II I I I 
orf7ng-l TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

100 110 120 130 140 150 

55 190 200 210 220 230 240 

orf 7-1 pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
Mill : 1 I M I M II I I I M II I I I : It I I I I I I I I I M 1 II I I I I I I I I M I I I I I I I 
orf7ng-l LNEAWAGRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 

160 170 180 . 190 200 210 



60 



250 260 270 280 290 300 

orf 7-1 . pep IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
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I I II I I I I I ! I I I I I I I I I M I I I I I I I I I I I I I I I! I I I I :. I I M I I I I I I I I I It I 
orf7ng-l IYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 

220 230 240 250 260 270 

310 320 330 

or f 7-1 .pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I If I M I I I M I M I I I I I I I I I I I i I I 
orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

280 290 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

sp|P28306|YCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi! 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional Oterminal residues [Escherichia coli] Length = 340 

Score = 79 (36.2 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 

Identities = 20/87 (22%), Positives - 40/87 (45%) 



Query: 


10 


G I S SVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDI LQKMRGGRPD 


69 






G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 




Sbjct : 


49 


GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 


108 


Query: 


70 


SVTVQI IEGSRFSHMRKVIDATPDIGH 96 






++++EG R S K + P I H 




Sbjct : 


109 


QFPLRLVEGMRLSDYLKQLREAPYIKH 135 




Score 


= 436 


(200.7 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 




I dent i 


ties = 


- 84/155 (54%), Positives = 111/155 (71%) 




Query: 


120 


EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 


179 






EG F+PD-+ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS + IEK 




Sbjct : 


156 


EGWFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLPYKDKNQLVTMASIIEK 


217 


Query: 


180 


ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 


239 






ET VASVF+NRL+ IGMRLQTDP+VIYGMG Y GK+ +ADL T YNTYT 




Sbjct : 


216 


ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 


277 


Query : 


240 


GL?PTRIALPGKAAMDAAAHPSGEKYLYF\ 7 SKMDG 27 4 








GLPP IA PG ++ AAAHP+ YLYFV+ G 




Sbjct: 


276 


GLPPGAIATPGADSLKAAAH PAKTPYLYFVADGKG 312 





Based on this analysis, including the fact that the K influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AG AT AT T T AC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTG.GAACGCA CAAAATCCCC CGAAGTCGCC 

301 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

3 5i AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 

4 01 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 
4 51 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 

51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 

101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 



9924578A2J_: 



WO 99/24578 PCT/IB98/01665 

-79- 

151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT AT G AAAC AG C 

5 101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

10 351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

4 01 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

4 51 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

15 go 1 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

6 51 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

7 01 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

7 51 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

?0 851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 AT GAT G TAT G CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

? 5 HOI AAAAGTATCC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

H51 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTT CCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

-301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

30 1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

14 01 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

14 51 GGCTTGCACC CGATAACGCT C AG ATT AT G A ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

35 ?601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

16 51 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

17 01 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 
17 51 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 
1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

40 This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

45 201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

3 51 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

4 01 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 
50 4 51 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 
551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 
601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 

55 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

orf 9 pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
60 | | : I : I I : I : I : I M : I I It : I I I I I I I I I I I M 1 I I I 1 1 I t I I 11 i I I I 

or f 9a MLPARFTILSVLAAALLAGQAYAA— GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-80- 



PCT/IB98/01665 



10 20 30 40 50 

60 70 80 90 100 110 

orf 9 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I I M I I I It I I I I M I II I I I I I I I I I I M I I I i I I I M I I I I I I I I M M I I I I ! I I 
O r f 9 a AVGERVNQ I FTLLGXETALQKGQAGT ALAT YMLMLERTKS PE VAERALEMAVS LNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

orf 9 . pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 
I I I I 1 I I I I M I I I M I I M M I I M II I I I I I I I M I M I I I I I I i 
orf 9a EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
120 130 140 150 160 170 

orf 9a AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
180 190 200 210 220 ■ 230 

The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 

1 ATGTTACCCG CCCGTTTCAC CAT T T T AT CT GTGCTCGCGG CAGCCCTGCT 

51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 

101 AAGTCGGAAA GGTTTTCAGA AAGCAGCAGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAGCGGG TTAATCAGAT 

201 ATTTACGTTG CTGGGANGGG AAACCGCCTT GCAAAAGGGG CAGGCGGGAA 

2 51 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 
301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCNCTGAACG CGTTTGAACA 
351 GGCGGAAATG ATTTATCAGA AATGGCGGCA GATTGAGCCT ATACCGGGTA 
4 01 AGGCGCAAAA ACGGGCGGGG TGGCTGCGGA ACGTGCTGAG G G AAAGAGG A 

4 51 AATCAGCATC TAGACGGACT GGAAGAANTG CTGGCTCAGG CGGACGAANG 

5 01 ACAGAACCGC AGGGTGTTTT TATTGTTGGC ACAAGCCGCC GTGCAACAGG 
551 ACGGGTTGGC GCAAAAAGCA TCGAAAGCGG TTCGCCGCGC GGCGTTGAGA 
601 TATGAACATC TGCCCGAAGC GGCGGTTGCC GATGTGGTGT TCAGCGTACA 
651 GGNACGCGAA AAGGAAAAGG CAATCGGAGC TTTGCAGCGT TTGGCGAAGC 
7 01 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 
7 51 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 
851 TGCACAGGCT GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACGC 
901 AATCCGAATG CAGACCTGTA TATTCAGGCA GCGATATTGG CGGCAAACCG 
951 AAAAGAANGT GCTTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCAGGGCGG CAATGACGGC GGCGATGATA 

10 51 TATGCCGACC GAAGGGATTA CACCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG TGTGCTGGCG GCTGCGGCGG 

1151 CTGTCGAGTT GGACNGCGGC AGGGCGGCTT TGCGG CAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

12 51 CAAAATACAG ATGTTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAGGCTT 

1301 TGAGGGGGTT GGACAAGATT ATCGAAAAAC CGCCTGCCGG CAGTAATACA 

1351 GAGTTACAGG CAGAGGCATT GGTACAGCGG TCAGTTGTTT ACGATCGGCT 

14 01 TGGCAAGCGG AAAAAAATGA TTTCAGATCT TGAAAGGGCG TTCAGGCTTG 

14 51 CACCCGATAA CGCTCAGATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA AGGCTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CTGTCAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAANG CGACGCGGAA AGCGCGCTGC CGTATCTGCG GTATTCGTTT 

1651 GAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

17 51 ACCTTACGGG AGACAAGAAA ATATGGCGGG AAACGCTCAA ACGTCACGGC 

1801 ATCGCATTGC CCCAACCTTC CCGAAAACCT CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 44>: 

1 MLPARFTILS VLAAALLAGQ AYAAGA ADAK PPKEVGKVFR KQQRYSEEEI 

51 KNERARLAAV GERVNQIFTL LGXETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP I PGKAQKRAG WLRNVLRERG 

151 NQHLDGLEEX LAQADEXQNR RVFLLLAQAA VQQDGLAQKA SKAVRRAALR 

201 YEHLPEAAVA DWFSVQXRE KEKAIGALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLHRLDDA YARLNVLLER 

301 NPNADLYIQA AILAANRKEX ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

3 51 YADRRDYTKV RQWLKKVSAP EYLFDKGVLA AAAAVELDXG RAALRQIGRV 
401 RKLPEQQGRY FTADNLSKIQ MFALSKLPDK REALRGLDKI IEKPPAGSNT 

4 51 ELQAEALVQR SWYDRLGKR KKMISDLERA FRLAPDNAQI MNNLGYSLLS 
501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKXDAE SALPYLRYSF 
551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLTGDKK IWRETLKRHG 



WO 99/24578 



PCI7IB98/01665 



-81- 



601 IALPQPSRKP RK+ 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 



10 



15 



20 



25 



30 



35 



orf 9a .pep 
orf9-l 



10 20 30 40 50 

MLPARFTILSVIAAALLAGQAYAAG--AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 

Ml II : I : I I : I : I : I I I : III I - I I I I I I I I M I M M I I M ! I I I I I I I I I 
MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
10 20 30 40 50 60 



60 70 80 90 100 110 

orf9a pep AVGERVNQ I FT LLGXE T ALQKGQAGT ALAT YMLMLERT KS PE VAERALEMAVS LN AFEQA 

I I I 1 M I I I I I If I I I II I I I I I I I I I II I I I M I I I I I I M I II II I I M I I 

o r f 9- 1 AVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLN AFEQA 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 9a pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLiAQADEXQNRRVFLLLAQ 
| I I I I I I I I I I I I I I I I I I M I 1 I I I I I I I M M I ! I I I I I lillll I I I I I I I I I I I 
orf 9-1 EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 

180 190 200 210 220 230 

orf 9a . pep AAVQQDGIAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
M I I M I M I M M I I I I I M M I II I M I I II M II I I I I II M I I I M I I I I II I I I 
orf G -l AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
190 200 210 220 230 240 

240 250 260 270 280 290 

orf 9a. pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL; 

I I I I M M I I I I I I II I M I I M M I I I I I I I I I M I M M I M I I I II I I I ! I I I M II 
or^9-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL • 

250 260 270 280 290 300 

300 310 320 330 340 350 

orf 9a . pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 
I | | | | | | I I I M I I M I II I I I M II I 1 M M I I I I! I I I : I I i : I M I : 1 I I M M : 
or ^9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
310 320 330 340 350 360 



40 



45 



50 



55 



360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
M II I I M I I I II II I i M II I I II M M I M I M I M M I I II I I I I II I I II M I M 
orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
M I : M I I I M I II I II I I I II II I I I I I I I II I I I I I II I M I I I II I I I I I 1 I I I M I 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 
I I I I M I I M I I M I I I I I I I M I II I I II II I I M M I I I I M II I I I I I I I I M I I I 
orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 



60 



540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
I | | I M M M I II 1 M I II I I I I 1 I I I I M III I I I M I I M M I I M I II II I I I II I I 
orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
550 560 570 580 590 600 



65 



orf 9a . pep 



orf9-l 



600 610 
HGIALPQPSRKPRKX 
II I II I M I I I II M 
HGIALPQPSRKPRKX 
610 



BNSDOCID: <WO 9924S78A2_L> 



WO 99/24578 
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-82- 



Homologv with a predicted ORF from N gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from N. 



gonorrhoeae: 



10 



Orf 9 
orf 9ng 
orf9 
orf 9ng 
orf9 
orf 9ng 



RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKKERAR 
| ! : | : | | : I : I : I I I : II It:!:: I I I I I I I : I I : : I I I I I I I I I it I I 
MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 



54 



58 



114 



LAAVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 
| | | | I | | I I : : I I II I I M I I I M I I I I I i II M M I M i I M M I I I I I I I I > M I I M 
LAAVGERVNRVFTLLGGETALQKGQAGTALAT YMLMLERTKS PEVAERALEMAVSLNAFE 118 



QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 

| M | | | I I I I I I I I I I I : I I I I I II I I I I : I II III I I I ! I : I 
QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 



166 



178 



1 5 The ORF9ng nucleotide sequence <SEQ ID 45> was predicted to encode a protein having including 



acid sequence <SEQ ID 46>: 



20 



1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG 



VELPKEVGKV 
KGQAGTALAT 
EPIPGEAQKP 
AAVQHGGVAQ 



PPCGRNPQTE 
PEMETYQTGF 



LRKHRRYSEE 
YMLMLERTKS 
AGWLRNVLKE 
KPSKAVRPAA 
NIAPPFNELF 
PRPLTRNNPT 



Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 



25 Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

30 201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

2 51 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 
301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

3 51 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 
401 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

35 4 51 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

40 7 01 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

7 51 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

8 51 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 
901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 

45 951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

10 51 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

.1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

50 12 01 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 C AAAAT AC AG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

13 51 . GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

14 01 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 
55 14 51 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
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170" GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 
17 51 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 
1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 

5 i ML PARFTILS VLAAALLAGO AYAAGA ADVE LPKEVGKVLR KHRRYSEEEI 

51 KNERARLAAV GERVNRVFTL LGGETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP I PGEAQKPAG WLRNVLKEGG 

151 NQHLDGLKEV LAQSDDVQKR RIFLLLVQAA VQQGGVAQKA SKAVRRAALK 

201 YEHLPEAAVA DAVFGVQGRE KEKAIEALQR LAKLDTEILP PTLMTLRLTA 

10 251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLRKPDDA YARLNVLLEH 

301 NPNANLYIQA AILAANRKEG ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

351 YADRRDYAKV RQWLKKVSAP EYLFDKGVLA AAAAAELDGG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MLALSKLPDK REALIGLNNI IAKLSAAGST 

4 51 EPLAEALAQR SIIYEQFGKR GKMIADLETA LKLTPDNAQI MNNLGYSLLS 

15 501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKGDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLRGDKK IWRETLKRYG 

601 IALPEPSRKP RK* 

ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 

10 20 30 40 50 60 

OH orf9-l Dep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 

I I ! f I : I : M: I: 1 : 1 I I : I I I I : I : : I I 1 I I I I : I I : : I II I M II M I I I I I 
or^9nq-l M LPARFTILSV1JU^LLAG0AYAAG--AADVELPKEVGKVLRKHRRYSEEEIKNERARLA 

10 20 30 40 50 

25 70 80 90 100 110 120 

or f 9-1 pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I M I I 1 I I t I II I I I I I I I I I M I II t I I I I I I I I II I I 1 I I I I I I M I I I 1 I I I II 
orf9nc-l AVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

30 

130 140 150 160 170 180 

or f 9 - 1 P*P EMI YQKWRQ IEPI PGKAQKRAGWLRNVLRERGNQHLDGLEE VLAQADEGQNRRV FLLLAQ 
j | | | | | | || | | I I I I : I I I I I I I I I I I : I I 11 I I I I I : M M I : I : I : I I : I I M : I 
orf9ng-l EMI YQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 

35 120 130 140 150 160 170 

190 200 210 220 230 240 

or f 9 - 1 pep AAVQQDGLAQKASKAVRRAALK YEHLPE AAVAD WFS VQGREKE KAI GALQRLAKL DTE I 

" " | | | | | |:|||| I I I I I I I I I I I I I II I : I I : I I I M I M I I MINIMUM 

40 orf 9no-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 

180 190 200 210 220 230 

250 260 270 280 290 300 

orf 9-1 pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
45 " I I M i II I I I I I I I M I I I I IN I I I I I I II I I I I I I IN 1 I II I I : : I I I I I I I I I I I 

orf9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 

310 320 330 340 350 360 

50 orf 9-1 .pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

I : 1 I | I : I I I I II I I I I M I I M I I I I I I I I 1 M 1 I I I I I I : I I I : I I 1 I : I I I I M M 
orf 9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

55 370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
| 1 | | | M | M I I I I I I II I II I I I M : I II I M I I I I I I I M I M I I I I I I I I I I I I I I I 
orf 9ng-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

60 

430 440 450 460 470 480 

orf 9-1. pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
I I I I I 1 11 1 I M II II I I : : I I I I : : : I I II I I : I II : : I : : : 11 I I I I : I M 
orf 9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 
65 420 430 440 450 460 470 

490 - 500 ' 510 520 530 540 
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orf 9-1 . pep RAFRIAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 

I :: t : I I I I I I II I I M I I I : I I I 1 I i I M I 1 I I II t I I I I I I I I M I I I I I M 

orf 9ng-l TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
480 490 500 510 520 530 

5 

550 560 570 580 590 600 

or f 9-1 . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
| | | | I | I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I > I I I I I i 
orf 9ng-l AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDWTQAAHLRGDKKIWRETLKR 
10 540 550 560 570 580 590 

610 

orf 9-1 . pep HGIALPQPSRKPRKX 
: I I! I I : I I I I I I I I 
15 orf9ng-l YGIALPEPSRKPRKX 

600 610 

In addition, ORF9ng shows significant homology with a hypothetical protein from P. aeruginosa: 

sp jP4 2810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 
(ORF3) 

20 >gi 1 1072999 I pir ! IS49376 hypothetical protein 3 - Pseudomonas aeruginosa >gi|557259 

(X82071) orf 3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect = le-28 

Identities = 138/587 (23%), Positives - 228/587 (38%), Gaps = 125/587 (21%) 

25 Query: 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQAEMIYQKWR 126 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A LA ++A W 
Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQKTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Query: 127 QIEPIPGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

30 - - + p +AQ+ A ++ VL G+ H D L A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 

++ KY + + A+ Q ++A+ L+ + 

35 Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 



40 



60 



Query: 233 KLDTE I LPPTLMTLRLTARK YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 

E+PL+L + K P + G E D + + + + LV + 

Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y + + + 

Sbjct: 271 DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 



45 Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ — VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 388 

Query: 372 YLFDKXXXXXXXXXXXXXXXXXXRQIGRVRKLPEQQGRYFTADNLSKIQMLALSKLPDKR 431 
50 Y A L 1+ ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

Query: 432 EALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLETALKLTPDNAQIM 491 
+A + ++ ELL RS + + E+ +M DL + PDNA + 

55 Sbjct: 409 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 462 

Query: 492 NNLGYSLLSDSKRLDEG FALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 463 NALGYTIADRTTRYGEARELILKAHKLNPDDPAILDSMGWINYRQGKLADAERYLRQALQ 522 



Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

p+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 

Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 



65 gi 1 2983399 (AE000710) hypothetical protein [Aquifex aeolicus] Length = 545 

Score = 81.5 bits (198), Expect = le-14 - 

Identities = 61/198 (30%), Positives = 98/198 (48%), Gaps = 19/198 (9%) 

Query: -408 GRYFTADNL-SKIQMIALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQ 459 

70 ' G Y A L K ++LA PDK+E L + +K + + L + 
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Sbjct : 


335 


GNYEDAKRLIEKAKVLA PDKKE I LFLE ADYYSKTKQYDKALEI LKKLEKDYPNDSR 


390 


Query: 


460 


RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS — DSKRLDEGFALLQ 


513 








Sbjct: 


391 


VYFMEAIVYDNLGDIKNAEKALRKAIELDPENPDYYNYLGYSLLLWYGKERVEEAEELIK 


450 


Query: 


514 


TAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSF-ENDPEPEVAAHLGEVLWALGER 


572 




A + + P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 




Sbjct: 


451 


KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 


510 


Query: 


573 


DQAVDVWTQAAHLRGDKK 590 








++A + + +A L + K 




Sbjct: 


511 


EEARNYYERALKLLEEGK 528 





Based on this analysis, it is predicted that the proteins from N. meningitidis and gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CAT C AAAG AG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CGaCTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

4 51 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

7 01 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

1 . . NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 

51 W AIIVLTIIV KAVLY PLT NA SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 

101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 

151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 

201 FFFPAGXVLY WWNNLLTIA QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ED 51>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA T C G AC AT C AA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 
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10 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ACTGGTTCGC 
ATCGGCAACT 
CGTACTGTAT 
GTGCCGCCGC 
CGTATGGCGC 
CAACCCGCTG 
TCGGATTGTA 
TGGCTGGGTT 
GCCCATCATT 
CGCCGACCGA 
TTCTCCGTCA 
AGTCAACAAC 
TCGAAAAACA 



CTCCCCGCTC 
GGGGCTGGGC 
CCATTGACCA 
ACCCAAACTG 
AACAACAGGC 
GGCGGCTGCC 
TTGGGCATTG 
GGATTACCGA 
ATGGCGGCAA 
CCCGATGCAG 
TGTTCTTCTT 
CTCCTGACCA 
ACGCGCCCAA 



TTCTGGCTCC 
GATTATCGTT 
ACGCCTCTTA 
CAAGCCATCA 
GATGATGCAG 
TGCCTATGCT 
TTCGCCTCCG 
CCTCAGCCGC 
CGATGTTCGC 
GCGAAAATGA 
CTTCCCTGCC 
TCGCCCAGCA 
GGCGAAGTCG 



TGAACCAACT 
TTAACCATCA 
CCGCTCTATG 
AAGAGAAATA 
CTTTACACAG 
GTTGCAAATC 
TAGAATTGCG 
GCCGACCCCT 
CCAAACTTAT 
TGAAAAT CAT 
GGTCTGGTAT 
ATGGCACATC 
TTTCCTAA 



GCACAACATC 
TCGTCAAAGC 
GCGAAAATGC 
CGGCGACGAC 
ACGAGAAAAT 
CCCGTCTTCA 
CCAGGCACCT 
ACTACATCCT 
CTGAACCCGC 
GCCGTTGGTT 
TGTACTGGGT 
AACCGCAGCA 



This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 



15 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 



APATPITVTT 
YTYVAQSELL 
IDKVYTFTKG 
HSYVGPWYT 
HFMSTWILQP 
AEASINLYAG 
IGNWGWAI IV 



DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGRQSVCAAG 
PQTTSVIANI 
LTIIVKAVLY 



RMAQQQAMMQ 
WLGWITDLSR 
FSVMFFFFPA 



LYTDEKINPL 
ADPYYILPII 
GLVLYWWNN 



SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAKSG 
ECNIDIKRRN 
ADNLQLAKDY 
PLT NASYRSM 
GGCLPMLLQI 



KYKATGDENK 
SLEGDKVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSTSVSV 
GKVHWFASPL 
AKMRAAAPKL 
PVFIGLYWAL 



MAATMFAQTY 
LLTIAQQWHI 



LNPPPTDPMQ 
NRSIEKQRAQ 



PFILFGDGKE 
LSAPETRGLK 
SEPEGQGYFT 
PTGWLGMIEH 
PLAAIQNGAK 
FWLLNQLHNI 
QAIKEKYGDD 
FASVELRQAP 
AKMMKIMPLV 
GEWS* 



30 



35 



40 



45 



Computer analysis of this amino acid sequence gave the following results: 

Homology with a 60kDa inner-membrane protein (accession P25754) of Pseudomonas vutida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

LYAGPQTTSVIANI ADNLQLAKDYGKVHWFAS PL FWLLNQLHNI IGNWGWAI IVLTI IVK 61 
LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 



ORF11 


2 


60K 


324 


ORF11 


62 


60K 


384 


ORF11 


122 


60K 


444 


ORF11 


182 


60K 


504 



+ + PL+ ASYRSMA+MRA APKL A+KE++GDDR 



LY EKINPLGGCLP+ 



L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 



DPMQAK+MK+MP++ 



PAG VLYWWNN L+I+QQW+I R IE 



50 



55 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 1 shows 97.9% identity over a 240aa overlap with an ORF (ORF1 la) from strain A ofN. 
meningitidis: 

10 20 30 

orf 11 . pep NLYAG PQTTSVI AN I ADNLQLAKDYGKVHW 

I I I I I ! I I I I I I M I II I I I I II I I I I I I 
orf 11a IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 

40 50 60 70 80 90 

orf 11. pep FAS PLFWLLNQLHN I IGNWGWAI IVLTI I VKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
M I I I I I I I I I I I I I 1 I M I I! I I M I I I I I I I I I I I I I I I I I I II I 1 I i I I I M I I M I 
orf 11a FAS PL FWLLNQLHNI IGNWGWAI I VLT 1 1 VKAVLY PLTN AS YRSMAKMRAAAPKLQAIKE 

340 350 360 370 380 390 • 
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100 110 120 130 140 150 

KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

I n I I I I M I I I I I I I I I M I I I I 1 I I I I M I I II I I I I I I I I I I M I I II If I II M I ! 
KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

400 410 420 430 440 450 

160 170 180 190 200 210 

TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 

I | | | I | I I I I M I I I I I I I I M I I M I I M I I I I I I M M I I I I I I I 1 I III 

TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 

220 230 240 

WWNNLLTIAQQWHINRSIEKQRAQGEWSX 

t I : M I I I I I I I I I I I I II I I M M II I I I I 
WVINNLLTIAQQWHINRSIEKQRAQGEWSX 
520 530 540 

The complete length ORF1 la nucleotide sequence <SEQ ED 53> is: 

1 ANGGATTTTA AAAGACTCAC NGNGTTTTTC GCCATCGCAC TGGTGATTAT 

51 GATCGGATNG NAAANGATGT TCCCCACTCC GAAGCCCGTC CCCGCGCCCC 

101 AACAGACGGC ACAACAACAG GCCGTAANCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGNAN CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CNAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAANAA 

301 TACACCTACN TCGCCCANTC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCAC CTGAAACACG CGGTCTGAAA 

4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCGACTTGG ACGACGATGC CAANTCCGGN AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CNGACCGGCT GGCTCGGCAT GATTGAACAC 

7 51 CACTTCATGT CCACCTGGAT CCTCCAACCC AAAGGCGGAC AAAGCGTTTG 
801 CGCCGCTGGC GACTGCNGTA TNGACATCAA ACGCCGCAAC GACAAGCTGT 

8 51 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CTATCCAAAA CGGTGCGAAA 
901 TCCNAAGCCT CCATCAACCT CTACGCCGGC CCACAGACCA CATCNGTTAT 
951 CGCAAACATC GCCGACAACC TGCAACTGGN CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCCCTC TTTTGGCTTT TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGTTCGATG GCGAAAATGC 

1151 GTGCCGCCGC GCCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AGCAACAAGC CATGATGCAG CTTTACACAG AC G AG AAAAT 

12 51 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCNT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACCTAT CTGAACCCGC 

14 51 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCTTTGGTT 

1501 NTNTCNNNNA NGTTCTTCNN CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 GATCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This encodes a protein having amino acid sequence <SEQ ED 54>: 

1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGMIEH 

251 HFMSTWILQP KGGQSVCAAG DCXXDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

451 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 XSXXFFXFPA GLVLYW VINN LLTIAQQWHI NRSIEKQRAQ GEWS* 

ORF1 la and ORF1 1-1 show 95.2% identity in 544 aa overlap: 

10 20 30 40 50 60 



orf 11 .pep 
orflla 

orf 11 .^pep 
orflla 

orf 11 .pep 
orflla 



WO 99/24578 



PCT/IB98/01665 



-88- 



10 



15 



20 



25 



30 



35 



40 



45 



50 



orf lla • pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 
I I I I I I I I I I I I I It I 1 I II I I I I M i f I I : I I I M I : I I I I I I I I I : M I I I I 
orf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf lla . pep DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
I I I I I I I I I I I I I ! I ! J I I I I 1 I I II I I I I I M I i I I I III I I I I I I I I I I I M I I 
orf 11-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf lla . pep IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
I I I I I I I I I I I II I I I I I I I II I I I I I II I I I I I I I I I I I j I I I I I I I I I I I I I I II I I I 
orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf lla . pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 
I I 1 I M I I I I I t I II I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I II I I I I I I I I 
orf 11-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 1 la . pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 
I I I I 1 M I I I I I I I II I I I I I 1111111:1 M I I M I I I M I I I I I I I I I I I I I I.I 
orf 11-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

310 320 330 340 350 360 

erf lla . Dep SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 
: I M I I I M I I I I t I I I I I I I II II I I M I I I I M I I I I II I I I I I I M II I I M I I I 
orf 11-1 AEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf lla . pep LTI IVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

I I I I I I I I I M I I I I I I I I I I M I I M I I I M I M I I I I I M i I I I I I I I I I I I I I I M I 
orf 11-1 LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf lla . pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
I II I I I I I I M I I I I I M I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I II II M II I I 
orf 11-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf lla. pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 
I I M I I II i I I I I I I I I I t I I II I I I I I I I I II : I I I I I I 1 I I I I I I Ml I I I I I I 
or f 1 1 - 1 LNPP PTDPMQAKMMKIMPLVFS VMFFFFPAGLVLYW WNNLLT I AQQWHINRS IEKQRAQ 

490 500 510 520 530 540 



55 



orf lla . pep 
orfll-1 



GEWSX 
I I I I I I 
GEWSX 



60 Homology with a predicted ORF from N. gonorrhoeae 

ORF11 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORFll.ng) from N. 
gonorrhoeae: 



65 



Orf 11 NLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLT 57 

I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I II I I I 1 I 11 I I I I I II I I I I I : I I I 
orf ling MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIWLT 60 



BNSDOC!D: <WO 9924578A2_I_> 



WO 99/24578 



PCT/IB98/0166S 



-89- 



10 



15 



orfll 
orf ilng 


ITVKAVLVPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 

1 I j I I 1 j 1 1 | { | | | j | | I I I I I M 1 1 : 1 1 : 1 1 1 1 1 I i 1 i I'M 1 1 1 1 1 1 1 ^ 1 1 : i 1 M M 
IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFEDEEINPLGG 


117 
120 


orfll 


CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 

• | 1 i 1 1 i 1 i t 1 M 1 1 1 1 1 1 1 1 i I 1 ! i 1 1 1 1 II 1 1 1 II It 1 1 1 1 M 1 1 

| | | j | | | | | | | 1 I 1 1 II 1 M II 1 1 1 1 1 1 1 1 1 l l 1 M l i I i I I i I i i i r i i i i i l i i i i i i 

CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 


177 


orf ling 


180 


orfll 


PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWKINRSIEKQRAQGE 
M 1 M M 1 1 1 II II 1 i 1 1 II MINN 1 M 1 1 II M 1 1 1 1 M 1 1 1 II 1 i 1 1 1 1 1 1 1 1 
PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGE 


237 


orf ling 


240 


orfll 


WS 240 




orf ling 


1 1 i 

WS 243 





An ORF1 lng nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 



20 



i 

51 
101 
151 
201 



MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK 
NWGW AIWLT IIVKAVLYPL T NASYRSMAK 
AQQQAMMQLF EDEEINPLGG CLP MLLQIPV 



GWITDLSRAD PYYILPIIMA ATMFAQTYLN 
VMFFFFPAGL VLYWWNNLL TIAQQWHINR 



VHWFASPLFW LLNQLHNIIG 

MRAAAPELQT IKEKYGDDRM 

FIGLYWALFA SVELRQAPWL 

PPPTDPMQAK MMKIMP LVFS 

SIEKQRAQGE WS* 



Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 



25 



30 



35 



40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGGATTTTA 
GATCGGCTGG 
AACAGGCGGC 
GCGCCCGCAA 
TGATGAAAAA 
CAACCGGCGA 
TACACCTACG 
TCTGAAAGGC 
GCGACACAGT 
ATCGACAAAG 
CT T CG AC AT C 
ACCGCATCGT 
CACTCTTACG 
AGTCAGCTTC 
ccgaatacaT 
cacttcatgt 
cgcccaggga 
acagcgcaag 
aaaccgaaaa 
TATCGCAAAC 
TACACTGGTT 
ATTATCGGCA 
AGCCGTACTG 
TGCGTGccgc 
GACCGTATGG 
AATCAACCCG 
TCATCGGCTT 
CCTTGGCTGG 
CCTGCCCATC 
CGCCGCCGAC 
GTTTTCTCCG 
GGTGGTCAAC 
GCATCGAAAA 



AAAGACTCAC 
GAAAAAATGT 
ACAAAAACAG 
CGCCGATTAC 
AGTGGCGACC 
CGAAAACAAA 
TCGCCCAATC 
ATCGGCTTTA 
CGAAGTCCGC 
TCTATACCTT 
GCCAACGGCA 
CCGCGACCAC 
TCGGCCCTGT 
TCCgacTTgg 
CCGCAAAACC 
ccacctggat 
gactgccgta 
cgtcagcgtg 
tggcggTCAA 
ATCGCcgacA 
CGCATCGCCG 
ACTGGGGCTG 
TATCCATTGA 
cgcacCcaaA 
CGCAACAGCA 
CTGGGCGGCT 
GTACTGGGCA 
GCTGGATTAC 
ATTATGGCGG 
CGACCCGATG 
TCATGTTCTT 
AACCTCCTGA 
ACAACGCGCC 



GGCGTTTTTC 
TCCCCACCCC 
GCAGCAACCG 
CGTAACGACC 
TGCGCCGGCT 
CCGTTCGTCC 
CGAACTTTTG 
GCGCACCGAA 
CTGAGCGCGC 
TACCAAAGAC 
GCGGTCAAAC 
AGCGAACCCG 
TGTTTATACC 
acgACGATGC 
ccgaccggtt 
cctccAAcct 
tcgacattaa 
cctttaaccg 
CCTGTATGCC 
ACCTGCAACT 
CTCTTCTGGC 
GGCAATCGTC 
CCAACGcctc 
CTGCAGACCA 
AG C GAT GAT G 
GTctgcctat 
TTGTTCGCCT 
CGACCTCAGC 
CAACGATGTT 
CAGGCGAAAA 
CTTCTTCCCT 
CCATCGCCCA 
CAAGGCGAAG 



GCCATCGCGC 
GAAACCCGTC 
CTTCCGCCGA 
GACACGGTTC 
GACCCTGCTC 
TGTTTGGCGA 
GACGCGCAGG 
AAAACAGTAC 
CCGAAACCAA 
AGCTATCTGG 
CGCCAACCTG 
AGGGTCAAGG 
CCTGAAGGCA 
gaaaTccggc 
ggctcggcat 
aaaggcggcc 
aCgccgcaac 
ctatcccaac 
GGTCCGCAAA 
GGCAAAAGAC 
TCCTGAACCA 
GTTTTGACCA 
ctACCGTTCG 
TCAAAGAAAA 
CAGCTTTACA 
gctgttgCAA 
CCGTAGAATT 
CGCGCCGACC 
CGCCCAAACC 
TGATGAAAAT 
GCCGGTTTGG 
GCAGTGGCAC 
TCGTTTCCTA 



TGGTGATTAT 
CCCGCGCCCC 
AGCCGCGCTC 
AAGCCGTTAT 
AAATACAAAG 
CGGCAAAGAA 
GCAACAACAT 
ACCCTCAACG 
CGGACTGAAA 
TCAACGTCCG 
AGCGCGGACT 
CTACTTTACC 
ACTTCCAAAA 
aaATccgagg 
gattgaacac 
aaaacgtttg 
gacaagctgt 
ccgggggcca 
CCACATCCGT 
TACGGTAAAG 
ACTGCACAAC 
TCATCGTCAA 
AT GGCGAAAA 
ATAcgGCGAC 
AAgacgAGAA 
ATCCCCGTCT 
GCGCCAGGCA 
CCTACTACAT 
TATCTGAACC 
CATGCCGTTG 
TTCTCTACTG 
ATCAACCGCA 
A 



This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 



60 



1 MDFKRLTAFF 

51 APATPITVTT 

101 YTYVAQSELL 

151 IDKVYTFTKD 

201 HSYVGPWYT 

251 HFMSTWILQP 

301 KPKMAVNLYA 



AIALVIMIGW 
DTVQAVIDEK 
DAQGNNILKG 
SYLVNVRFDI 
PEGNFQKVSF 
KGGONVCAQG 
GPQTTSVIAN 



EKMFPTPKPV 
SGDLRRLTLL 
IGFSAPKKQY 
ANGSGQTANL 
SDLDDDAKSG 
DCRIDIKRRN 
IADNLQLAKD 



PAPQQAAQKQ 
KYKATGDENK 
TLNGDTVEVR 
SADYRIVRDH 
KSEAEYIRKT 
DKLYSASVSV 
YGKVHWFASP 



AATASAEAAL 
PFVLFGDGKE 
LSAPETNGLK 
SEPEGQGYFT 
PTGWLGMIEH 
PLTAI PTRGP 
LFWLLNQLHN 



BNSDOCID: <WO 9924578A2_1_> 



WO 99/24578 



-90- 
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351 IIGNWGW AIV VLTIIVKAVL YPLT NASYRS MAKMRAAAPK LQTIKEKYGD 

4 01 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

4 51 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

501 VFSVMFFFFP AGLVLY WWN NLLTIAQQWH INRSIEKQRA QGEWS* 

5 ORF 1 1 ng- 1 and ORF 11-1 shown 95 . 1 % identity in 546 aa overlap : 



10 20 30 40 50 60 

orf llng-1 . pep MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 
I I ! I 11 I I! I I M I I II I I I I I I I I M I II I I II I I I I : M : I 1 I I I II I I I I I I I I I I I 
orf II- 1 MDFKBLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 
10 10 20 30 40 50 60 



70 80 90 100 110 120 

orfllng-1 .pep DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 
I I I I I 1 I I I! I I I I I I I I i I I i I I II I I I I I I : I I I I I I I I I I I I r I I I I I I I I I I I I I I 
15 orf 11-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orfllng-1 . pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 
20 1111111111:1:11 I I I I I I I I I I I I II I It I M I I I I I I I I I I I I I I I I I I I I I I 

orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

130 140 150 160 170 180 



190 200 210 220 230 240 

25 orf Una- 1 .pep SADYRIVRDHSEPEGQGYFTHSYVGPWYT PEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

t I I I I I II I I I II I I I I I I I I I I II I I M I I II ! I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 11-1 SADYRI VRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 



30 250 260 270 280 290 300 

orfllng-1 . pep PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 

M I M I I 1 I M I! I I I I I I I II I : I I I 1:1 II II I f I I I I II : I I I I I I : I I : I 
orf 11-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 

250 260 270 280 290 

35 

310 320 330 340 350 360 

orfllng-1 . pep KPKMAVNLYAG PQTTS VI AN I ADNLQLAKDYGKVHWFASPLFWLLNQLHN IIGNWGWAIV 
I : :: II I I I I I I I I I i I I I I I 1 I I I I II M I I I I I I I I I I I I i I I I I M M I I M M : 
orf 11-1 KAEAS I NLYAGPQTTSVI AN I ADNLQLAKDYGKVHWFASPLFWLLNQLHN 1 1 GNWGWAI I 

40 300 310 320 330 340 350 



370 380 390 400 410 420 

orfllng-1 .pep VLT 1 I VKAVLYPLTNAS YRSMAKMRAAAPKLQT IKEKYGDDRMAQQQAMMQLYKDEKIN P 
I I I I I I II I I I I I I I M I I II I I I I I I II M I : I I I M I I I I I I I I I I I I I I I II II I I 
45 orf 11-1 VLT 1 1 VKAVLYPLTNAS YRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

360 370 380 390 400 410 

430 440 450 460 470 480 

orf llng-1 .pep LGGCLPMLLQ I PVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 
50 I I M I I I 1 I I I I I II I I I I I I ! I I I I I II I I I I I I I I M I I I I I I I I I I I I I I I I II I I I 

orf 11-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYI LP I IMAATMFAQT 

420 430 440 450 460 470 



490 500 510 520 530 540 

55 orfllng-1 .pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 

I I II I I I M M I M I I 1 I I M M I I II I I I I M I I I M II I M I I II I I I I I I I 1 I I I I I 
orf 11-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
480 490 500 510 520 530 

60 

orf llng-1 .pep QGEWSX 
I I I I I M 

orfll-1 QGEWSX 
540 

65 In addition, ORF1 lng-1 shows significant homology with an inner-membrane protein from the 
database (accession number p25754): 



BNSDOCID: <WO 9924S7BA2J_> 



WO 99/24578 



-91- 



PCT/IB98/01665 



ID 60IM_PSEPU STANDARD; PRT; 560 AA. 

AC P25754; 

DT 01-MAY-1992 (REL. 22, CREATED) 

DT 01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 60 KD INNER -MEMBRANE PROTEIN. . . . 

SCORES Initl: 1074 Initn: 1293 Opt: 1103 

Smith-Waterman score: 1406; 41.5% identity in 574 aa overlap 

10 20 30 40 

MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

||:|| ::!::::!:::! : : I I I I I I : : : I : : 

MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 
10 20 30 40 50 60 

50 60 70 80 90 

AAT AS AE AALAPAT PIT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: : i : | | : : I : I : : I II::: : I I : I I : : I : I I I I : I M 

VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 
70 80 90 100 110 120 



orfllng-1 .pep 
p25754 

orfllng-1 .pep 
p25754 



100 110 120 130 140 

orfllng-1. pep VL FG DGKE YT YVAQS E L LDAQGNN I LKG I G FSAPKKQYT L-NG D TVEVRLSAPE 

I I : I { : i : I I I I : : I : : : I : : I : i : I I : I : : I : : : : I 

p257 54 QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS 

130 140 150 160 170 

150 160 170 180 190 200 

or f ling- 1 . Dep TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRIVRDHS-EPEGQGYF-THSY 

||:: I : : I : I : I I : I I 1 I I : I : : : I j I : ! : : I : I 

D 2 57 54 DNGVNYIKRFSFKRGEYDLNVSYLIDNQSGOAWNGNMFAQLKRDASGDPSSSTATGTATY 

180 190 200 210 220 230 

210 220 230 240 250 260 

orf ling- I .pep VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 
: I : : : I : : I M : : I : I I : : : I : : It : : : : I : I : : : I I I : 

P 25754 LGAALWTASEPYKKVSMKDID KGSLKE NVSGGWVAWLQHYFVTAWI-PAKSD 

240 250 260 270 280 

270 280 290 300 310 320 

orf Una- 1 .pep QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 

: j I :::::: I : : I : : : I : I ! : : : I I I It : I : : : : 

p257 54 NNV VQTRKDSQGNYI IGYTGPVISVPA-GGKVETSALLYAGPKIQSKLKELSP 

290 300 310 320 330 

330 340 350 360 370 380 

orf Una- 1 .Dep N LQLAKDYGKVHWF- AS PLFWLLNQLHN 1 1 GNWGWAI WLT 1 1 VKAVLYPLTNAS YRSMA 
: I : I : III : II I : ! : I II ! : : : I : : : I I It I : 1 : I I I : : : I : : : : I I : I 1 I I I I I 
p257 54 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAAS YRSMA 

340 350 360 370 380 390 



390 400 410 420 430 440 

orf ling- 1 .pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
: I I I : I I I I : : I I : : I I I I : : : I I I I : M I II I I I II I I I I : I : I : 1 I I : : I I I : I : 
p257 54 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 

450 460 470 480 490. 500 

orfllng-1 .pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
111:11111: I M I I I I I :: I I I I I I : I I II i Ml I I I I I I I : II : I I : : I 
p257 54 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 4 90 500 510 



510 520 530 540 

orfllng-1 . pep SVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQGEWSX 

: : I :: | | I I 1 t I I I I I I I I : I : I I I : I : I I I 
p257 54 TFFFLWFPAGLVLYWWNNCLSISQQWYITRRIEAATKKAAA 
520 530 540 550 560 
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Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 Example 8 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 59>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

10 151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

2 51 ACCGTTACGA AGTT.TTTAT CGCGGTACG. ACTGGCAGGC TCAAAATACG 
301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

3 51 AGGCAACCTT CTTATTATCA CACACCCTTA A 

1 5 This corresponds to the amino acid sequence <SEQ ID 60; ORF13>: 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGST PAAVLT X ALLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ID 61>: 

20 1 . .GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

25 251 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

3 51 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGST PAAVLT X ALLSALGIX 
30 51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVfY RGTWWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of N. 
35 meningitidis: 

10 20 30 40 50 

orf 13 . peD AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXA LLSALGIXF 

I I I I I I I I I i I I 1 \ I I I I II I 1 M II I II It M I 1 I I I I I I I I I I I II I 
orf!3a MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTA ALLSALGIWF 
40 10 20 30 40 50 60 

60 70 80 90 100 110 

or f 13 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 
I I I M I I M I I I I II I I i I I II : I M I I : I I M I I I MM I I I I II I I I I M I I I I I 
45 orf 13a VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

70 80 . 90 100 110 120 

120 

orf 13. pep LI VRKEGNLLI ITHPX 
50 I M I I I I I I I M : : I I 
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o r f 1 3a LI VRKEGNLLI IAKPX 

130 

The complete length ORF13a nucleotide sequence <SEQ ID 63> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

5 51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

10 301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

4 01 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 64>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
15 51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAE I LRHA GGNRYEVFYR 

101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 

ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 

10 20 30 40 50 60 

orf 13a pep MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
20 ' I I I I I I I I I I 1 I I M I I I I M I I I I I M M I II I M I I II ! II M M I I 

orf 13-1 A V LI I E L LTG TV Y L LWSAALAG SGI A YG LTG S T P AAV LTXALLS ALG I X F 

10 20 30 40 50 

70 80 90 100 110 120 

1$ orf 13a pep VRAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

[Mill] ! M II M M I II M i M M II : M I I I M II I I M I I I I I I I M II I II I I i 
o r f 1 3 - i VHAKT AVRKVETDS YQDLDAGQYVE I LRHTGGNRYE VFYRGTHWQAQNTGQEELE PGTRA 

60 70 80 90 100 110 

30 130 

orf 13a. pep LI VRKEGNLLI IAKPX 



35 



II I M i I II II : : \ I 
orf 13-1 LIVRKEGNLLI ITHPX 

120 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from TV. 
gonorrhoeae: 



40 



45 



orfl3 


AVLI IELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

M 1 1 1 M 1 i 1 1 M 1 1 M II M 1 1 1 1 1 M II II 1 1 1 1 I 1 1 1 II 1 1 1 II 1 1 
MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 


51 


orf 13ng 


60 


orfl3 


VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 

1 1 t I | (1 I M II 1 M 1 II : 1 : 1 : M 1 1 : M M M II Mil II 1 1 1 1 1 1 I =111111 
VHAKTAVGKVETDSYODLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 


111 


orf 13ng 


120 


orfl3 


LIVRKEGNLLI ITH? 12 6 
1 ! M 1 M II 1 M : M 
LIVRKEGNLLI I ANP 135 




or f 13ng 





50 The complete length ORF13ng nucleotide sequence <SEQ ID 65> is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

55 201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

4 01 ACCCTTAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

5 ORF13ng shows 91.3% identity in 126 aa overlap with ORF 13-1 : 

10 20 30 40 50 

or f 13-1 .pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

I I I I II I I I I I I I II I I I I I II I I I I I I I I I I M I I I I II I I II II I I I 
• orf 13ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
10 " 10 20 . 30 40 50 60 

60 70 80 90 100 110 

orf 13-1 . pep VHAKT AVRKVET DS YQDLDAGQ YVE I LRHTGGNRYE VFYRGTHWQAQNTGQEELE PGTRA 
I I I II II I I M I I I I I I I : I : I : I I I I : I I I I I I I I I i I i I I II I I i I I I I : II I I I I 
15 orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 

70 80 90 100 110 120 

120 

orf 13-1 .pep LIVRKEGNLLI ITHPX 
20 I II I II I I II I I : : I I 

orf!3ng LIVRKEGNLLI IANPX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
25 ORF 13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 



Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 

30 1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

35 2 51 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

3 51 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

4 01 ATGCCGTC . . 

This corresponds to the amino acid sequence <SEQ ED 68; ORF2>: 

40 1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV.. 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

45 51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

2 51 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 
50 301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

3 51 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT. TCCGACGTTA 

4 01 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 
4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 
501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
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551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 
601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

5 i MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

10 Further work identified the corresponding gene in strain A of N.meningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

]5 201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

20 4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

5 51 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

25 This encodes a protein having amino acid sequence <SEQ ID 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

30 2 01 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118aa 
overlap with ORF2a: 

10 20 30 40 50 60 

orf2 pep MXD FGLGELVFVGI IALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 

35 ' I I I I I I I I I I I I I I I I I I I II I I : I I II I M I I I I M II I I 1 I M I I I I I M I 

orf2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

M I 1 I It M I I I I II I I I I I I I I I I I I I i I M I I I I I I I I I I I I I I I I I I i I I I I M I 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 

70 80 90 100 110 120 

45 130 

or f 2 . pep RCGKHPIRRHFRRYAV 

orf 2a D AANT LLDG I SDVM PS ERSYASAETLGDSGQTG ST AEPAETDQDRAWREY LTASAAAPW 

130 140 150 160 170 180 

50 The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a. pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I M I I I i { I M I I I i I I I I i II II II I I I I I 1 I M I I I I I II I I I I I I I I I I II I I I I I I 

orf 2-1 MFDFGLGELVFVGIIALIVLG PERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

55 orf 2a . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I I I I I II I I I I I 1 I I I I I I I I 1 I I M I II 11 I I M I M I I I I I I i I I I I I I I I 1 I I I I : I 
orf2-l . . KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

orf 2a . pep DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

60 ' I I I I I I I I M I I I I M I 11 I I M I I I I I I I I M M I I I M I I I I M I I I I I I I I | I | | I 
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orf 2-1 DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

orf 2a . pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 22 9 

M t i I 1 I I M I I M i I I M I i I I 1 I I I I I I I I : I I I I I I I I I I I I I I I 
orf 2-1 QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 229 



Further work identified a partial DNA sequence <SEQ ID 73> in N. gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

101 GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

151 G AC ACT C AAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 T G C AG AAC AG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

351 tccccttccc gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

4 01 TGCCGTCTGA ACGTTCCGAT ACTtccgcCG AAACCCTTGG GGACGACAGG 

4 51 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCc aAACCGAAat tgcgcgtcCG TAAATCATAA 

This encodes a protein having the amino acid sequence <SEQ ID 76; ORF2ng-l>: 



1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 



overlap with ORF2ng: 

orf 2 . pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I I I I i I I I : I I M I J 1 I I I I ! I I I I i I I : I I I I I M I i I I I i II I I 1 : I I I M I I I I I 
orf2ng MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf 2 .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

1:11 I t I I I 1 II ) I I I II I I II : : : I II I I I ! I I I M I M I I I I I I I II i 1 I : I i 
orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 120 

orf 2. pep RCGKHP I RRH FRRYAV 136 

1 III II I II I II M 
orf2ng RYGKHRIRRH FRRYAV 13 6 

The complete strain B and gonococcal sequences (ORF2-1 &. ORF2ng-l) show 91.7% identity in 



229 aa overlap: 

10 20 30 40 50 60 

orf 2-1 .pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I I 1 I I I I II : I II M I M I I I I I I I I I I I I I I II I M I M I I I I I I I I I : I I I I I I M I I 
orf2ng-l MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 

10 20 30 40 50 60 

70 80 ' 90 100 ■ 110 120 

orf 2-1 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
|:|| | | I I 11 M I I I I II I I I I ::: M I I I It I I I I I I M I I I I I I 1 M I I I I I II M 
orf2ng-l KVKQAFEAAAAQVRDSLKET DTDMQNSLHD I SDGLKPWEKLPEQRTPADFGVDENGNPLP 



9924578A2 I > 
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70 80 90 100 110 120 

130 140 150 160 170 180 

or f 2-1 pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAE PAETDQDRAWREYLTASAAAPW 

5 " ' * | : | | I : M M I I I M I II I : I I I I I I I : I I I I I I I I I I I I : I I I I 1 I I I I I I I I I I I I 

or f2nq-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 

130 140 150 160 170 180 

190 200 210 220 229 

10 orf 2-1 . peD Q- T VE V S Y I DT AVET P V PHT T S LRKQAI S RKRD FR PKHRAKPKLRVRKS X 

I : I I I I I I I I I I I I ! I I I I I I I I I I I I : I I I I I I I 1 II I II I I I I I I I 
orf2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 

Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
1 5 and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 

of E.coli: 

gnl I PID| el292161 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score = 56.6 bits (134), Expect - le-07 

Identities « 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 
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Query: 1 MFDFGLGELI FVGI IALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I • L+ +V+ EL +++L+E + 

Sbjct : 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 



25 Query: 61 - KVKQAFE AAAAQVRDS LKET DT DMQN S 87 

+ K+ +A+ + LK + +++ + 
Sbjct: 61 D S LKKVEKAS LTNLT PELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
proteins and so the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
of expression of the His-fusion in E.coli. Purified GST- fusion protein was used to immunise mice, 
35 whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 77>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC . TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

45 251 ATTGATGCAC kGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

. 301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

4 01 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

4 51 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GAT AC AG AT G TGTTTATTAA 

601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG.. 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 

5 1 MQARLLIPIL FSVFILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201. IDVFGTIRNR TEM. . 

10 Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG AC ACT G AC AG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

15 201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

20 4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

50i CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC AC CT AT ACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

25 701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 
601 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GAT AT C C G AC 
851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 
901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 

30 951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ ID 80; ORF15-l>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

35 151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ED 81>: 

40 1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACT AT GGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

45 2 51 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

3 51 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 
4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

50 501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

55 7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

8 51 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 
901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 
951 AGGGCAACCT TGA 

60 This encodes a protein having amino acid sequence <SEQ ED 82; ORF15a>: 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 
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51 DMDLQALHGR 

101 DYTYPRYETT 

151 IGGMGDYRNE 

201 IDVFGTIRNR 

251 AYKENYALWM 

301 SHEGYGYSDE 



KVALYIATMG 
AETTSGGLTG 
TLTTNPRDTA 
TEMHLYNAET 
GPYKVSKGIK 
AVRRHRQGQP 



DQGSGSLTGG 
LTTSLSTLNA 
FLSHLVQTVF 
LKAQTKLEYF 
PTEGLMVDFS 



RYSIDALIRG 
PALSRTQSDG 
FLRGIDWSP 
AVDRTNKKLL 
DIQPYGNHMG 



EYINSPAVRT 
SGSKSSLGLN 
ANADTDVFIN 
IKPKTNAFEA 
NSAPSVEADN 
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The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 
overlap with ORF15a: 

10 20 30 40 50 60 

orf 15 . pep MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 
I I I I I J I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M ! I I I I I I I M ! I I I 
orf 15a MQARLL IPILFSVFIL SA CGT LTG I PS HGGGKR FAVEQE LVAAS ARAAVK DMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15. pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I II I I I I I M I ! I II I I I I ! I I II M I I I I I I I I I I II I I II I I I I I I I I I I I I I 
orf 15a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15. Deo LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I M I I I II I I I M I I I I I I I I I M I I M I I M I I I I I I I I I II I I M I I I I I I 
orf 15a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
130 140 150 160 170 180 



190 200 210 

orf 15. oeo FLRGI DVVS PAN ADTDVFINI DVFGTIRNRTEM 
I I I I I i M I I I I I II M I I I I I I I I II I I I I I I 
. 30 orf 15a FLRGI DWS PANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap: 



10 20 30 40 50 60 

orf 15a. oeo MQARLLI PI LFSV FI LSACGTLTG I PS HGGGKR FAVEQE LVAAS ARAAVK DMDLQALHGR 

35 I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I II I I M I I I I I I I I M I M I I I I 

orf 15-1 MQARLLI PILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

40 orf 15a. peD KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

1 I M I I I I I I I I I II I I I I I I I I I I 11 I I II I M ! M I I M I I I I I I I I I I I I I II I I I I 
orf 15-1 KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

45 130 140 150 160 170 180 

orf 15a .peo LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I M I I I I It I I I I I I M I I I I M I I I I II I I I I M II I I I II f I I I I I I I I I I I U I 
orf 15-1 LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMG DYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

50 

190 200 210 220 230 240 

orf 15a. peo FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
I I I I I I I I I II I I I I I I I I M I I I I I I I ! U I I I I I I I I I I I I I M 1 I I M I I I I I II I I 
orf 15-1 FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
55 190 200 210 220 230 240 



250 260 270 280 290 300 

orf 15a . pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
I I I I I I I ill I I I I I I II I I I I I I I I I I I I I I I I I II I I I M : I I M I II I M I I I I I I 
60 orf 15-1 IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 

310 320 
or f 1 5a . pep SHEGYGYSDEAVRRHRQGQPX 
65 I I I I I I I I I I : | I : | | I I I II 

o r f 1 5 - 1 SHEGYGYS DE WRQHRQGQPX . 
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F urther work identified the corresponding gene in N. gonorrhoeae <SEQ ID 83>: 

1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

5 101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

10 351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

15 601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC AC CT AT AC AA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

601 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

20 851 CATACGGCAA TCATACGGGT AACTCCGCCC CAT C C GT AG A GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 

1 MRARLLIPIL FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

25 51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADT DVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

2 51 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

30 301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 2 Baa 
overlap with ORF1 5ng: 

orfl5. pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 

I : I II I II I I 1 I I I I M I M I I I M M 11 I I I I 1 I I I I I I I I I I M I I I I I I M I I I I I 
35 orf 15ng MRARLLIPILFSVFILSACGTLTGI PSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 60 



40 



orf 15. pep . KVALYIATMGDQGSGSLTGGRYS I DAXXXGEYINS PAVRTDYTYPRYETTAETTSGGLTG 120 

I I I I I 1 I I I I I I I I I 1 I I I I I I I I I I I II i I 1 I f I I I 1 I I I t I i I I I I j I I I ! I I I ! 

orf!5ng KVALYIATMGDQGSGSLTGGRYS I DALIRGEYINSPAVRT DYTYPRYETTAETTSGGLTG 120 

orf 15. pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

I II I I I I I 1 I 1 I I I I I I M I II I : I I I I I I II M II I I I II ! I I I I M I M I I I II I M I 

orfl5ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 



45 orf 15. pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEM 213 

I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I 
orf 15ng FLRG I DVVS PAN ADTDV FI N I DVFGT I RNRTEMHLYN AET LKAQTKLEYFAVDRTNKKLL 24 0 

The complete strain B sequence (ORF15-1) and ORF15ng show 98.8% identity in 320 aa overlap: 

10 20 30 40 50 60 

50 orf 15-1 .pep MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

I : I I I I I I I I i 1 I I I M I I I I I i I I I I I I i I I I I I I I I I I I I I I I I I I I M II I I I I I I I 
0rfl5ng MRARLLI PILFSVFILSACGTLTGI PSHGGGKRFAVEQEL VAASARAAVK DMDLQALHGR 

10 20 30 40 50 60 

55 70 80 90 100 110 120 

orf 15-1 .pep KVALYIATMGDQGSGSLTGGRYS I DALIRGEYINS PAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I II M I II I I I I M I I I I I M I I I I I I I I I II I I I I I I I M I I M I I I I I I I 1 I I 
orfl5ng KVALYIATMGDQGSGSLTGGRYS I DALIRGEYINS PAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 



60 



130 140 150 . 160 170 180 

orf 15-1 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
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I i i i I i M I M I I 1 I || I I II I I : I II I I II I I 11 M I i M I I ! M I M I I I M II II M 
LTT S LST LN AP ALS RTQS DG S G S RS S LG LN I GGMG D YRN ET LT TN PRDTAFLS HLVQT V F 
130 140 150 160 170 180 

190 200 210 220 230 240 

S-l dpd FLRG I DVVS PANADT DVFIN I DVFGT IRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

° r P P ! M i i M | I I t I M I I I I I M t I M it I ! I I t M I 1 1 I I I I I I I M I I I M II I 

FLRGI DWS PANADT DVFIN I DVFGT IRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
190 200 210 220 230 240 



WO 99/24578 



orf 15ng 



orf 15ng 



250 260 270 280 290 300 

nrf , 5-1 npD IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

* P | ||t | III! IIIMI 111111:111111 IIMM 

orflSna IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 
° rtX 9 250 260 270 280 290 300 



310 320 
orf 15-1 .pep SHEGYGYSDEWRQHRQGQPX 
I M I I I I I M : II I I I I I I I I 
20 orfl5ng SHEGYGYSDEAVRQHRQGQPX 



310 320 



Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF 15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
30 results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confimi that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen 

Example 11 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 85>: 

35 ] . . GG . CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

^51 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

40 25 ^ AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 A T TGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

4 51 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

45 501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC . TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 

1 . . GQHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 
51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 
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101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 

151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

5 51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

2 51 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

10 301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

4 51 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC AT CG GC AC AT 

15 5 51 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

7 01 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

7 51 Tc . TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

20 801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AQKLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMI FGVFTGA 

101 LSAKYIP AFG LQI FFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

25 151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLKIAGL? EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

2 51 X FGIKLLLIA GKMLYNLL* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical H. influenzae transmembrane protein HI0902 (accession number P44070) 
30 ORF1 7 and HI0902 proteins show 28% aa identity in 192 aa overlap: 

HKKQAVNGKTVFTMMPGMI FGVFT-GAFSAKYI PAFGLQI F — FILFLTAVAFKTLHTDP 59 
HK * + V + P ++ VF G F + +IF +++L ++ D 

HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 130 

35 ORF17 60 QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 119 

Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 



45 



ORF17 


3 


HI0902 


72 


ORF17 


60 


HI0902 


131 


ORF17 


120 


HI0902 


190 


ORF17 


180 


HI0902 


250 



40 +SG S+^++G +PE SLG++YLPAV ++A-+ + LG 



F + L-++A M 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of N. 
meningitidis: 

50 10 20 30 

orfl7.pep . GQHKKQAVNGKT V FT MM PGMI FGVFTGA FS 

I I M I I 1 I : I I I 1 I M I M : I I I I : I I : I 
orfl7a QGLAQHPYAQHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 
50 ~~ 60 70 80 90 100 



55 



40 50 60 70 80 90 

orf!7 .pep AKY I P AFG LQI FFILFLTAVAF KTLHTDPQTASRPLPGLPXLTAV5TLFGTMS 5 WVG I GG 
I I I It I I I I I I I I 1 I I I INI I I I II 1.1 I I I J II I I I I I I I ! I ! I I I M, I I I I I I I I | | 
orf 1 7a AKYI P AFGLQI FFILFLT AVAF KTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



PCT/IB98/01665 



-103- 



110 



120 



130 



140 



150 



160 



100 110 120 130 140 150 

orf 17 pep GSLSVFFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
I | | | | | M | I | M I I I M I M I I I M I I I I I I I I I I I II I I I 1 I I I I I It I I I I I I I I I 1 
orf 17 a GSLSVPFLIHCGFPAHKAIGTSSGIAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 



10 



160 170 180 190 

orf 17 . pep AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
I I I I I t I I I I ! I M I I I I I I I I M I I I I I I M I I I II 11 I I I I II I I 
o r f 1 7 a AVLSAAT I AFAPLGV KT AHKLS SAKLKKS FGIMLLLI AGKMLYNLL X 

230 240 250 260 

The complete length ORF 17a nucleotide sequence <SEQ ID 89> is: 



15 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCGTATTTAC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCACAA 
CCCCTTCTTA 
CATCCGGCCT 
CTCAACGGCC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGCTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGA 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
TCGGCGTAGG 
GATTTGCAGG 
CACATCCTTC 
AGCACAAAAA 
GGTATGGTAT 
AGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCCGTAG 
CGGCGGCACG 
GTTTGGCACA 
GCCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGAC 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCAC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



GCAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CCTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAAAA 
TGTACAACCT 



This encodes a protein having amino acid sequence <SEQ ED 90>: 



35 



i 

51 
101 
151 
201 
251 



MWHWDI I 11 



LAVGSAAGFI AG L FG VGGGT LIVPWLWVL DLQGLAQHPY 



AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 

LSAKYI P AFG LQI FFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

FGTMSSWVGI GGG3LSVPFL IHCGFPAHKA IGTSSGLAW? IALSGA1SYL 

LNGLNIAGLP EGSLGFLYL? AVAVLSAATI AFAPLGV KT A HKLSSAKLKK 
SFGIMLLLIA GKMLYNLL"* 



ORF 17a and ORF 17-1 show 98.9% identity in 268 aa overlap: 



40 



45 



50 



10 20 30 40 50 60 

orf 17a . peD MWHWDIILILLAVGSAAGF1AGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 
I I I I i I I II M I I I I I I II I I I I I I I I I I I I I I I II II I I II I I I I I I I I I II I I I II M 
orf 17-1 MWHWDI ILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 17a .pep AVMV FTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYI PAFG LQI FFILFLT 

I I I I I I I I I I I I I I I M II I M I I t I I M I I I : M II : I I I M M III I I I I I I I I M I I 
orf 17-1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQI FFILFLT 

70 80 90 100 110 120 



55 



130 140 150 160 170 180 

orf 17a. pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I M I I I I I I M I 11 I I I I I I I I I I I I I I I II I I II I M I I I I I I I M I I I I I I I I 
orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 



60 



190 200 210 220 230 240 

orf 17a. pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
I I I I I I I I I I I I I I I I ! II I I I I I I I I I I I I I I I I I I ! II I I I I I I I I I I I I I I I I! I I I 
orf 17-1 IGTSSGLAW PI ALSGAISYLLNGLN I AGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 

190 200 210 220 230 240 



65 



orf 17a . pep 



250 260 269 

HKLS SAKLKKS FGIMLLLI AGKMLYNLLX 



BNSDOCtD: <WO 9924578A2_I_> 



WO 99/24578 



-104- 



PCT/IB98/01665 



orfl7-l 



I I I I I I I I II I I I I I I I I I I I I I I I I I I 
HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
250 260 



Homology with a predicted ORF from N. gonorrhoeae 

ORF17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 



10 



15 



20 



gonorrhoeae: 

orf 17 .pep 
orf 17ng 
orf 17 .pep 
orf!7ng 
orf 17 . pep 
orfl7ng 
orf 17 .pep 
orf 17ng 



GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 30 
I I I I I II I : I I : I : I I I I I I I I I 1 : I I : I 
QGLAQHPYAQHLAVGTS FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVFAGALS 102 

AKY I PAFGLQ I FFI L FLTAVAFKTLHT DPQTASRPL PGL PXLTAV STLFGTMS SW VG I GG 90 

| | | | | M I II I I I I I I I I I I ! I I I i ! I I 1 I 1 I I I M i I I I I I I I M I : I I I I I I I M 
AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 

I I I I M I I II I I I I I I I I I I I I I M I I I I M I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 

AVLSAATIAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLL 196 
I I I I I I I I I I I 1 I I I I I I I I I II I I I I : I II I M ! I I II I I I I I I I 
AVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKMLYNLL 2 68 



An ORF1 7ng nucleotide sequence <SEQ ID 91 > is predicted to encode a protein having amino acid 



sequence <SEQ ID 92>: 



25 



30 



i 

51 
101 
151 
201 
251 



MWHWDIILIL 
AQHLAVGTSF 
LSAKYI PAFG 
FGAMSSWVGI 
VNGLNIAGLP 
SFGIMLLLIA 



LAVGSAAGFI 
AVMVFTAFSS 
LQIFFILFLT 
GGGSLSVPFL 
EGSLGFLYLP 
GKMLYNLL* 



AGLFGVGGGT 
MLGQHKKQAV 
AVAFKTLHTG 
IHCGFPAHKA 
AVAVLSAATI 



LIVPWLWVL 
DWKTIFAMMP 
RQTASRPLPG 
IGTSSGLAWP 
AFAPLGVKTA 



DLQGLAQHPY 
GMIFGVFAGA 
LPGLTAVSTL 
IALSGAISYL 
HKLSSAKLKE 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGTGGCATT 
AGGTTTTATT 
CTGTCGTTTT 
GCGCAACACC 
CTTTTCCAGT 
CCATATTTGC 
CTCTCCGCAA 
GTTTTTAACC 
CATCCCGCCC 
TTCGGCGCAA 
CCCCTTCTTA 
CATCCGGCCT 
GTCAACGGTC 
TTACCTGCCC 
CGCTCGGTGT 
TCCTTCGGCA 
GCTTTAA 



GGGACATTAT 
GCCGGCCTGT 
ATGGGTGCTT 
TCGCCGTCGG 
ATGTTGGGGC 
GATGATGCCG 
AATATATCCC 
GCCGTCGCAT 
GCTGCCCGGG 
TGTCGAGCTG 
ATCCACTGCG 
TGCCTGGCCG 
TGAATATTGC 
GCCGTCGCCG 
CAAAACCGCC 
TTATGTTGCT 



CTTAATCCTG 
Tcggtgtagg 
GATTTGCAGG 
CAcaTccttc 
AGCACAAAAA 
GGTATGATAT 
CGCGTTCGGG 
TCAAAACACT 
CTGCCCGGAC 
GGTCGGCATA 
GCTTCCCCGC 
ATTGCACTCT 
AGGATTGCCC 
TCCTCAGCGC 
CACAAACTTT 
TTTGATTGCC 



CTTGCcgtag 
cgacgGTACG 
GTTTGGCACA 
gcCGTCATGG 
ACAGGCGGTC 
TCGGCGTATT 
CTTCAAATTT 
GCATACCGGT 
TGACTGCGGT 
GGCGGCGGTT 
CCATAAAGCC 
CCGGCGCAAT 
GAAGGGTCGC 
GGCAACCATT 
CTTCTGCCAA 
GGAAAAATGC 



gcAGTGCGGC 
CTGATTGTCC 
ACATCCTTAC 
TCTTCACCGC 
GACTGGAAAA 
CGCTGGCGCA 
TCTTCATCCT 
CGTCAGACGG 
TTCCACACTG 
CACTTTCCGT 
ATCGGCACAT 
ATCGTATCTG 
TGGGCTTCCT 
GCCTTTGCCC 
ACTCAAAGAA 
TGTACAACCT 



This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 



50 



55 



i 

51 
101 
151 
201 
251 



MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 
AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKT IFAMMP G M IFG VFAGA 



LSAKYI P AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 
SFGIMLLLIA GKMLYNLL* 



ORF17ng-l and ORF 17-1 show 96.6% identity in 268 aa overlap: 



orf 17-1 .pep 



10 -20 .30 40 50 60 

MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 



BNSDOC1D: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-105- 

, ( M M I I I I I I i I I I M I I I II I It I M I I I I I I I M I I M I II I I I I I I I I II I I II ! 
orfl7na-l mwhwdiilillavgsaagfiaglfgvgggtlivpwlwvldlqglaqhpyaqhlavgtsf 

y 10 20 30 40 50 60 

70 80 90 100 110 120 

nrf 1 7-1 oen AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 

' F F M I I I II I I | | | | M I I M : I: I I I I I I I I I I : M II I II I I I I I I I I I I I I I I I 

orfi7na-l ' AVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMI FGVFAGALSAKYI PAFGLQI FFILFLT 

9 70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf!7-l pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
| M | | | | | | | | | | | I I I I I I I 1 M I I I II I : I I I I I 11 I I I I I I I I I I I I I I I I I I I I 
orfl7na-l AV AFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 
15 y 130 140 150 160 170 180 

190 200 210 220 230 240 

or^l7-l pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
| | i | l | | | M | | | | | || I I I : I I I I I M I M I I I I II I I I I I I II I I I I I I I M II I II I 
20 orf7ng-l JGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 

250 260 269 

orf 3 7-1 . pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 
25 ' I I I I I I I I I : I I I I I I I M M 1 I I I I I I 

o ^ 1 7 ng - 1 HKLS S AKLKES FGIMLLLI AGKMLYNLLX 

2 50 2 60 

In addition, ORF1 7ng-l shows significant homology with a hypothetical H. influenzae protein: 

spl p 44070 i Y902 HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
30 HI0902 - Haemophilus influenzae {strain Rd KW20) gi [1573922 (U32772) H. influenzae 

predicted coding reoion HI0902 [Haemophilus influenzae ] Length = 264 

Score = 74 (34.9 bits), Expect = 1.6e-23, Sum P{2) = 1.6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

35 Query: 55 AVGTS FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVF 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) - 1 . 6e-23 
40 Identities - 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct • 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

45 

Query: 210 PEG S LG FLYLPAVAVLS AAT I AFAPLGVKTAHKLS S AKLKES FG I MLLL I AGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+ ++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 

50 

This analysis, including the homology with the hypothetical H .influenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and TV. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

55 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 95>: 

1 . . GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

51 CGTCAGTAAT G TAT CG AT G A CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

151 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

60 201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 PCT/IB98/01665 

-106- 

301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 
351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
5 51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 

101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

10 101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

2 51 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

15 351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTT CGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

20 601 AGATAA 

This corresponds to the amino acid sequence <SEQ ED 98; ORF18-l>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRA AP LFI PHFYLTL- GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 
101 FA VSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LK PVYWFVLQ 
25 151 FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 

201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF18 shows 98.3% identity over a 1 16aa overlap with an ORF (ORF18a) from strain A of K 
30 meningitidis: 

10 20 30 

orf 18 . pep GNGWQADPEHPLLGLFA VSN VSMTLAFVGI 

I I I I I I I I t I I ! I I I I I I I I I I I II I I i I I 
. orf 18a. TRA AP LFI PHFYLTLG5 1 FFFI GHWNRKTDGNGWQADPEH PLLGLFA VSN VSMTLAFVGI 

35 60 70 80 90 100 110 

40 50 60 70 80 90 

orfl8.peo CALV HYCFSGTVQVFVFAALLKLYALK PVYWFVLQFVLMAVAW HRCGIDRQPPSTFGGS 
MINIM! ! I I I I I I I I I 1 I I I I I I I I I I 1 I I I I I I I I M ! I I I I I I I I I I I I I ! I! 
40 orf 18a CALV HY CFSXTVQVFVFAALLKL YALK PVYWFVLQFVLMAVAYV HRCGI DRQPPSTFGGS 

120 130 140 150 160 170 

100 110 
orf 18 . peo QLRLG GLTAALMQVSVLVLLLS EIGRX 
45 " M | I M M I M II I M I M I M M I I 

orfl8a QLRLG GLTAALMQXSVLVLLLS EIGRX 
180 190 200 

The complete length ORP18a nucleotide sequence <SEQ ID 99> is: 

J ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

50 51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT .GCTCGGGCTG 

55 301 TTTGCCGTCA "GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT" TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-107- 



PCI7IB98/01665 



4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

5 This encodes a protein having amino acid sequence <SEQ ID 100>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 

51 GIWGMTRA AP LFIPHFYLTL GS1FFFI GHW NRKTDGNGWQ ADPEHPLLGL 

101 FA VSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA LKPVYWFVLQ 

151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 

201 R* 



10 



ORF1 8a and ORF1 8-1 show 99,0% identity in 201 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



10 20 30 40 50 60 

orfl8a pet> MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
M I I i 11 I I I I M I M I M M II I I I ! M t I I I I I I I I M I I I I I II I II I I I I 1 I I I i 1 
orf 18-1 MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl8a pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

MM M I M I I 11 It I I II I I I I I I II I I I I I I M I I I I I I I II I I I I I I I I I I 

orfl8-l LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 18a. pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

MM I I I I I I I I II I I I I Ml ! M M I II I I I I I I I I M I M M M I ! I M II I 

orfl8-l Y C FS GT VQV FV FAALLKL Y ALK P VY W FVLQ FVLMA VA Y VHR CG I DRQP P S T FGG S QLR LG 

130 140 150 160 170 180 

190 200 
orf 18a. pep GLTAALMQXSVLVLLLSEIGRX 
II I M 1 II I M II I M II II I 
orf 18-1 GLTAALMQVSVLVLLLSEIGRX 

190 200 

Homology with a predicted ORF from N. gonorrhoeae 

ORF18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from N. 
gonorrhoeae: 

orfl8.pep 
orfl8ng 
orf 18 .pep 
orf 18ng 
orf 18 .pep 
orf 18ng 



GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 
II I I I I M \ M II I II II M M I I I M I I I 
TRAAPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 115 

CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 
M I I I I M M I I ! I I I I I 1 M I M M I I I I M I I I I I I M I M 11 I I I I I I I I I t M M I 
CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 17 5 

QLRLGGLTAALMQVSVLVLLLSEIGR 116 
Mill Ml I I M : I : : 1 M I I M 
QLRLGVLAAMLMQVAVT AMLLAE I GR 201 



50 The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501. 



ATGATTTTGC 
tttTctgTTT 
GTATTGCGTT 
GGGATGTGGG 
CCTGACTTTG 
CAGATGGAAA 
TTTGCCGTCA 
GTTGGTGCAT- 
CATTGCTCAA 
TTTGTATTGA 
GCCGCCGTCA 



TGCATTTGGA 
CTGATATTCC 
GTGGCTCGGC 
GAATGACCCG 
GGCAGCATAT 
CGGATGGCAG 
GTAATGTATC 
TATTGCTTTT 
ACTTTATGCG 
TGGCGGttgC 
ACGTTCGGCG 



TTTTTTGTCT 
GCGCAGGAAT 
ATCTCGGTTT 
CGCCGCGCCT 
TTTTTTTCAT 
GCAGACCCCG 
GATGACGCTT 
CGGGAACGGT 
CTGAAGCCGG 
CTATGTCCAC 
GTTCGCAGCT 



GCCTTACTGt 
GTTGCAATGG 
TAGGGGTAAA 
TTGTTCATCC 
CGGGTATTGG 
AACATCCGCT 
GCTTTTGTCG 
TCAAGTGTTT 
TTTATTGGTT 
CGCTGCGGTA 
GCGACTCGGC 



aTGCGGcggt 
TTTTGGGCGA 
GCTGATGCCG 
CCCATTTTTA 
AACCGGAAAA 
GCTCGGGCTT 
GAATATGTGC 
GTGTTTGCGG 
CGTGTTGCAG 
TAGACCGGCA 
GTGTTGGCGG 
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551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 
601 AGATGA 

This encodes a protein having amino acid sequence <SEQ ED 102>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIALWLG ISVLGVKLMP 
5 51 GMWGMTRA AP LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 

101 FA VSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKP VYWFVLQ 
151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLA EIG 
201 R* 

This ORF18ng protein sequence shows 94.0% identity in 201 aa overlap with ORF18-1: 

10 10 20 30 40 50 60 

orf 18-1 . pep MI LLHLDFLS ALLYAAVFLFLI FRAGMLQWFWAS IMLWLG I S VLGAKLMPGI WGMTRAAP 
It I I I i I I I I M I I M I I I I I I I I I I I I I t I I I I I I I I i II I M :'l I I I I : i I I I I I M 
o r f 1 8 ng MILLHLDFLS ALL. YAAVFLFL I FRAGMLQW FWAS I ALWLGI S VLG VKLMPGMWGMTRAAP 

10 20 30 40 50 60 

15 

70 80 90 100 110 120 

orf 18-1 .pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

I I I I M I I I I I II I I I I I : I I i I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I 
orf 18ng LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

20 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18-1 . pep YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I II I i i I M I M I I 
25 orf 18ng YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 

190 200 
orf 16-1 .pep GLTAALMQVSVLVLLLSEIGRX 
30 I : I I I I I : I : : I I : I I I I I 

o r f 1 8 n g VLAAMLMQVA VT AMLLAE I GRX 

190 200 

Based on this analysis, including the presence of several putative transmembrane domains in the 
35 gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 13 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

40 51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACAT CAT CA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

45 301 GGCGCGGNCG ... 

This corresponds to the amino acid sequence <SEQ ED 104; ORF19>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 
51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 
101 GAX . 

50 Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACAT CATC A CCACCGTCGC 
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201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG AT AG G C AAC C GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA G AC AT AC AC G AAC G C AT C AG CTCCGCCCAC 

7 51 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG T T T AC AG C AA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT C AG AC AG C AA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

12 51 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

14 01 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

14 51 TC ATT AC CAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CAT CAT CG AC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

17 01 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

17 51 CCCTTTCCGA CAT GAG C AG C GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA C AG C AG C G G A ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ED 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AASIVWQL GEPK LAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

- 251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

4 01 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 

551 KXTERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

7 01 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with predicted transmenbrane protein YHFK of K influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAAS IVWQLGE PKLAMPFVLG 1 1 AGGLVDLDNXXTGRLKNI ITT 65 

L +I+++PVF +V AA +W +MP +LGIIAGGLVDLDN TGRLKN+ T 

YHFK -5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 
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orfl9 66 VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 



10 



15 



20 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl9 pep MKTPLLKPLLITSLPVFASVFT AASIVWQLGEPK LAMPFVLGIIAGGLVDL DNXXTGRLK 
|,M | | | M I I I I I I I I I M I I I! I I I I M I I I I I I I M 1 I I II I I I M I I I Mill 
or ^19a MKTPPLKPLLITSLPVFASVFT AASIVWQLGEP KLAMPFVLGIIAGGLVDL DNRLTGRLK 

10 20 30 40 ■ 50 60 

70 80 90 100 

orf 19 . pep MTTTTVALFTLSSLTAOSTLGTGLP FILAMTLMTXXFTILGAX 

M I : I I I I M I I I I : I M II I I I I I I M I I I I I I 111:11 
orf 19a Nil AT VAL FT L S S LVAQST LGTGLP FI LAMTLMT FG FT IMGAV GLKYRT FAFGALAVAT Y 

70 80 90 100 110 120 

O^-f 19a TTLTYTPETYWLTNP FMILCGTVLYSTAIILF QIILPHRPVQENVANAYEALGSYLEAKA 
130 140 150 160 170 180 



The complete length ORF 19a nucleotide sequence <SEQ ID 107> is: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGAAAACCC 

CGCCAGTGTC 

AGCTCGCCAT 

TTGGACAACC 

CCTGTTCACC 

TGCCATTCAT 

GGCGCGGTCG 

CGCCACCTAC 

ACCCCTTTAT 

CTGTTCCAAA 

CGCCTACGAA 

ATCCCGACGA 

AGCAACACCG 

TTACCGCCTT 

GCTACTACTT 

GTCGACTACC 

CCGCATCCAC 

CCCAAGCCCT 

CGCGCCATCG 

CGACAATCCC 

GCGTCGACCA 

AACGACCGCA 

CAAAAACACC 

TATTCCGCCA 

ATCGTCGAAG 

CCTTTTCGTC 

AGCGCATCGC 

TACTTTACCC 

CACCCTCTTT 

T CATC AC CAT 

TACGCCGCCA 

TGCCTGGGCG 

TCGAACGCAC 

AAAATCACCG 

CCGCGCCACC 

CCCTTTCCGA 

CCCGGCTTTA 

CGCCCTCGGC 

TTACCGCACA 

CACCTGCCCG 

GCGCGGCGAA 

ACATCCTCCT 

TACCGCGCCT 

A 



CACCCCTCAA 

TTTACCGCCG 

GCCCTTCGTA 

GCCTGACCGG 

CTCTCCTCAC 

CCTCGCCATG 

GGCTGAAATA 

ACCACACTTA 

GATTCTGTGC 

TCATCCTGCC 

GCACTCGGCA 

AGCCGAATGG 

GCGTCATCAC 

CGCGGCAAAC 

CGCCGCCCAA 

AAGAGATGTC 

CGCCTGCTCG 

GCGCGCAAGC 

AAGGCTGCCG 

GACATCCGCC 

GCAGTTCCGC 

TGGGCGACAC 

TGGCAGGCAA 

TGCCGTCCGC 

CCCTCAACCT 

TGCCAACCCA 

CGGCACCGTA 

CCTCCGTCGA 

TTCATGACCC 

TCAAGCCCTG 

TGCCCGTACG 

GCAGTCAGCT 

CGCCGCCCTT 

AACGCCTCAA 

CGCCGCCGCG 

CATGAGCAGC 

CCCTGCTCAA 

GCATACCGCA 

GTTCCACCTC 

AAACCGAACC 

CTCGACACCC 

CCAACAGCTC 

ACCGACAAAT 



GCCTCTGCTC 
CCTCCATCGT 
CTCGGCATCA 
ACGGCTGAAA 
TTGTCGCGCA 
ACCCTGATGA 
CCGCACCTTC 
CCTACACCCC 
GGAACCGTAC 
CCACCGCCCC 
GCTACCTCGA 
ATAGGCAACC 
CGCCTTCAAC 
ACCGCCACCC 
GACATACACG 
CGAAAAATTC 
AAATGCAGGG 
AAAGACTACG 
CCAATCGCTG 
ACCTGCGCCG 
CAACTCCAGC 
CCGCATCGCC 
TCCGTCCGCA 
CTGTCCCTTG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATCGAC 
ACCTGTGGCC 
GCCGTATGCA 
AAGCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
. GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCA 
CAACTCATCG 
TCCGCACAGG 



ATTACCTCGC 

CTGGCAGCTG 

TCGCTGGCGG 

AACATCATCG 

AAGCACCCTC 

CTTTCGGCTT 

GCCTTCGGCG 

CGAAACCTAC 

TGTACAGCAC 

GTTCAAGAAA 

AGCCAAAGCC 

GCCACATCGA 

CAATGCCGTT 

GCGCACCGCC 

AACGCATCAG 

AAAAACACCG 

ACAAGCCTGC 

TTTACAGCAA 

CGCCTCCTTT 

CCTTCTCGAC 

ACAACGGCCT 

GCCCTCGAAA 

GCTAAACCTC 

TCGTTGCCGC 

TACTGGATAC 

CACCAAAAGC 

TCGTCGGCTC 

TGGATCGTCA 

ATACAGCTTC 

CCCTCGCAGG 

ACC ATT AT CG 

AGACTGGAAA 

GCAACGGCGC 

ACCGGCGACG 

CACCGCCGCC 

AATTCGCCGA 

GCCCTGACCG 

CGAAGAATGC 

ACACCGCCCA 

CAGACAGCAC 

CAGCAGCGGA 

CCCGGCAGCT 

CAGCCCCAAA 



TTCCCGTTTT 

GGCGAACCCA 

CCTGGTCGAT 

CCACCGTCGC ( 

GGCACAGGTT 

TACCATCATG . 

CACTCGCCGT 

TGGCTGACCA 

CGCCATCATC 

ACGTCGCCAA 

GACTTTTTCG 

CCTCGCCATG 

CCGCCCTGTT 

AAAATGCTGC 

CTCCGCCCAC 

ACATCATCTT 

CGCAACACCG 

ACGCCTCGGC 

CAGACAGCAA 

AACCTCGGCA 

GCAGGCAGAA 

CCGGCAGCCT 

GAATCAGGCG 

CGCCTGCACC 

TACTGACCGC 

CGCGTCCGCC 

GCTCGTCCCC 

TCGCCAGTAC 

TCGACATTTT 

GTTGGACGTA 

GCGCATCCCT 

TACCTCACGC 

CTATCTCGAA 

ACGTCGAATA 

CTCAGCAGCA 

CAGCCTGCAA 

GCTACATCTC 

AGCCCCGACT 

CATCTTCCAA 

TGGATACACT 

ACACAAAGCC 

CGAACCCTAC 

ACGCAGCCTG 
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This encodes a protein having amino acid sequence <SEQ ED 108>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MKT PPLKPLL ITSLPVFASV 



LDNRLTGRLK 
GAV GLKYRTF 
LFQIILPHRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNL NLG 
YFTPSVETKL 
YAAMPVRIID 
KITERLKSGE 
PGFTLLKTGY 
HLPETEPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
VQENVANAYE 
QCRSALFYRL 
KNTDIIFRIH 
RLLSDSNDNP 
ALETGSLKNT 
YWILLTALFV 
WIVIASTTLF 
TIIGASLAWA 
TGDDVEYRAT 
ALTGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL 
LSSLVAQSTL 
TTLTYTPETY 
ALGSYLEAKA 
RGKHRHPRTA 
RLLEMQGQAC 
DIRHLRRLLD 
WQAIRPQLNL 
CQPNYTATKS 
FMTRTYKYSF 
AVSYLWPDWK 
RRRAHEHTAA 
AYRSEMHEEC 
LDTLRTHSSG 



GEP KLAMPFV LGIIAGGLVD 
GTGLPF ILAM TLMTFGFTIM 
WLTNPFM ILC GTVLYSTAII 
IGNRHIDLAM 
DIHERISSAH 
KDYVYSKRLG 
QLQHNGLQAE 
LSLWAAACT 
LGVIVGSLVP 



DFFDPDEAEW 
KMLRYYFAAQ 
RNTAQALRAS 
NLGSVDQQFR 
ESGVFRHAVR 
RVRQR IAGTV 



STFFITIQAL 
YLTLERTAAL 
LSSTLSDMSS 
SPDFTAQFHL 
TQSHILLQQL 



TSLSLAGLDV 
AVCSNGAYLE 
EPAKFADSLQ 
AAEHTAHIFQ 
QLIARQLEPY 



ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

or^lSa oep ' MKTPPLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
MM | | | | | | j I I M 1 I I I I I t I I I I M I I 1 I I I I I I M I I I I I I I M I I I I I i M I I I 
O-n 9-1 MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf t 9a d^d Nil AT VAL FTL S S LVAQS TLGTGLP FI LAMT LMT FG FT IMGAVGLKYRT FAFG ALAVAT Y 

I | | : | l | || I | | I I : I I | I I I I I I II M I II II I I I M I : I I I M i I I II I M I I ! II I I 
Q ^ f i 9_t Nil TTVAL FTLSS LT AQSTLGTGLP FI LAMTLMT FGFT I LGAVGLKYRT FAFGALAVAT Y 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19a Dep TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQENVANAYEALGSYLEAKA 
M M M II I I I I I I I I I I I I I 1 I I I I I M : I I II : I I I I 1 I I I : I I I M : I I I : I I I I I I 
0 ^i9_i T TLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf i 9a pep DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

Ml M M I M I I I I II I I I M I 1 t I I M t I I I I I I I I I I I I II t I I I It I I I 1 I 

o-f "» 9-1 DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19a pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
M M i I I 11 I II I I I I I II II I II I I I M I M M I I I I I M I I I I I I M I M f II I I I I I 
orf 19-1 DIHERI SSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19a pe- RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
[ I | M I I i I I I 1 I M I I I : I I 1 I I II I I I I I I I M I I.I I II I It I I I M I I 1 t M I M 1 I 
orf 19-1 RAIEGCROSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19a. pep ALETG S LKNTWQAIRPQLNLE SGVFRHAVRLS LWAAACT I VEALN LNLGYW I LLTALFV 

M M : I I I I M It I I II I I I I I I I I t I I I I I t I 1 I II I I I I M I I t I I I I I I M I I I I I I 
orf 19-1 ALET S S LKNTWQAI R PQLNLE SGVFRHAVRLS LWAAACT I VEALN LN LG YW I LLTAL FV 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 1 9a . oep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
| | | | | | | | I I M i I I I I M II t I I I I I I I I I I M I M I I II I It I I II II I I II I I I I I 1 
orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

430 440 .450 460 470 480 

490' 500 510 520 530 540 

orf 19a. pep S T F F I T I QALT S LS LAG L D VY AAM PVR 1 1 DT 1 1 GAS LAW AAV S Y LW P D W KYLT LERT AAL 

| | | | | | | II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I II It I I I I I I ! I I I I 
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or f 19-1 STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 

550 560 570 580 590 600 

5 orfl9a pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

| | | | I I I I M I I I II I ! I I I If i I I I I I M I I I 11 I II I I I I I I I I I I I I 11 I I I I I I I I 
orfl9-l AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

10 610 620 630 640 650 660 

or-F19a pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I | I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 1 I I 1 I I I II I I I I I I I I I I I I I I I I 1 
orf 19-1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

610 620 630 640 . 650 660 

15 

670 680 690 700 710 

orfl9a.Dep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I | I I I I I I II I M I I I I I M I I I M I I I I I I I I I I I I I I I I II I I II I I I I I II I I I 
orn9-l QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
20 670 680 690 700 710 

Homology with a predicted ORF from N. gonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from N. 

gonorrhoeae: 

25 orf 19 . Deo MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

I I I I I I I I I I I I I I I I I I II I I I I I I I 1 I I I I I I I I I I II M M I I I I I I I I I Mill 
orf 19ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 19.peo N 1 1 TT VALFTLS S LTAQSTLGTGLPFI LAMTLMTXXFT I LGAX 103 

30 I i I : I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 

orf 19ng N 1 1 ATVAL FTLS S LTAQSTLGTGLPFI LAMTLMT FGFT I LGAVGLKYRT FAFGALAVAT Y 120 

An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

35 51 LDNRLTGRLK NIIATVA LFT LSSLTAQSTL GTGLPFILAM TLMT FGFTIL 

101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAI I 

151 LFQIILPHRP VQESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

40 301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

3 51 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

45 101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

3 01 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 
50 351 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 
4 51 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 
501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 
551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

55 601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

7 01 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

7 51 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

60 8 51 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



GCGTcgacca 
Aacgaccgca 
caaaaaCAcc 
TATTCCGCCA 
ATCGTCgaag 
CCTTTTCGTC 
AACGCATCGC 
TACTTCACCC 
CACCCTGTTC 
TCATCACCAT 
TACGCCGCCA 
TGCCTGGGCG 
TCGAACGCAC 
AAAATTGCCG 
CCGCATCACC 
CCCTTTCCGA 
CCCGGCTTTA 
CGCCCTCGGC 
TTACCGCACA 
CACCTGCCCG 
GCGCGGCGAA 
ACATCCTCCT 
TACCGCGCCT 
A 



gcagtTCcgc 
tgggcgacaC 
tggcaggCAA 
TGCCGTCCGC 
cCCTCAACCT 
TGCCAACCCA 
CGGCACCGTA 
CCTCCGTCGA 
TTCATGACCC 
TCAGGCACTG 
TGCCCGTGCG 
GCGGTCAGCT 
CGCCGCCCTT 
AACGCCTCAA 
CGCCGCCGCG 
CATGAGCAGC 
CCCTGCTCAA 
GCATACCGCA 
GTTCCACCTT 
ACATGGGACC 
CTCGGCACCC 
CCAACAGCTC 
ACCGACAAAT 
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caactCCGAC 
CCGCATCGCC 
TCCGTCCGCa 
CTGTCCCTCG 
CAACCTCGGC 
ACTACACCGC 
CTCGGCGTAA 
AACCAAACTC 
GCACCTACAA 
ACCAGCCTCT 
CATCATcgaC 
ACCTGTGGCC 
GCCGTATGCA 
AACCGGCGAA 
CCCACGAACA 
GAACCCGCAA 
AACCGGCTAC 
GCGAAATGCA 
GCCGCCGAAC 
CGACGACTTT 
TCCGCACCCG 
CAACTCATCG 
TCCGCACAGG 



ACAgcgactC 
GCCCtcgaaa 
gctgaaCCTC 
TCGTTGCCGC 
TACTGGATAC 
CACCAAAAGC 
TCGTCGGCTC 
TGGATTGTCA 
ATACAGTTTC 
CCCTCGCAGG 
ACCATTATCG 
AGACTGGAAA 
GCAGCGGCAC 
ACCGGCGACG 
CACCGCCGCC 
AATTCGCCGA 
GCCCTGACCG 
CGAAGAATGC 
ACACCGCCCA 
CAGACGGCAT 
CAGCAGCGGA 
CccgGCAACT 
CAGCCCCAAA 



CCCCGCcgaa 
ccggcagctT 
GAATCatgCG 
CGCCTGCACC 
TGCTGACCGC 
CGCGTGTACC 
GCTCGTCCCC 
TCGCCGGTAC 
TCCACCTTCT 
TTTGGACGTA 
GCGCATCCCT 
TACCTCACGC 
ATACCTCCAA 
ACATAGAATA 
CTCAGCAGCA 
CAGCCTGCAA 
GCTACATCTC 
AGCCCCGACT 
CATCTTCCAA 
TGGATACACT 
ACACAAAGCC 
CGAACCCTAC 
ACGCAGCCTG 



This corresponds to the amino acid sequence <SEQ ID 1 12; ORF19ng-l>: 



MKTPLLKPLL ITSLPVFASV 



51 
101 
151 
201 
251 
301 
3 c — 
401 
451 
501 
551 
601 
651 
701 



LDNRLTGRLK 
GAVGLKYRTF 
LFQIILPKRP 
SNTGVITAFN 
VDYQEMSEKF 
RAIEGCRQSL 
NDRMGDTRIA 
IVEALNL NLG 
YFTPSVETKL 
VAAMPVRIID 
KIAERLKTGE 
PGFTLLKTGY 
HLPDMGPDDF 
YRAYRQIPHR 



NIIATVALFT 
AFGALAVATY 
V OF S VAN A YE 
QCRSALFYRL 
KNTDIIFRIR 
RLLSDGNDS? 
ALETGSFKNT 
YWILLTALFV 
WIVIAGTTLF 
TIIGASLAWA 
TGDDIEYRIT 
A1TGYISALG 
QTALDTLRGE 
QPQNAA* 



FTAASIVWQL GEPK LAMPFV LGIIAGGLVD 
LSSLTAQSTL GTGLPF ILAM TLMT FGFTIL 
TTLTYTPETY WLTNP FMILC GTVLYSTAII 
ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 
RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 
RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 
DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 
WQAIRPQLNL ESCVFRHAVR LSLWAAACT 
CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 
FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 
RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
LGTLRTRSSG TQSHILLQQL QLIARQLEPY 



ORF19ng-l and ORF 19-1 show 95.5% identity in 716 aa overlap: 



10 20 30 40 50 60 

o^19-1 'pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 
I M I I 1 I M M I t I! I I M It I I I I I I M I I I I I I I I M t I! I I I I I I I t 1 M M I I I i I 
o^f 19nq-3 MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

or n9-i pep N 1 1 TTVAL FT LS S LTAQSTLGTGLP FI LAMTLMTFGFT I LGAVGLKYRT FAFGALAVAT Y 
I | | : | | I I i | I I I II I I I I 1 I I I M M M II I I M I I M I I II I I i I I I I II I M I I I I I 
0^19nq-l NIIATVALFTLSSLTAQSTLGTGLPFILAMT LMTFG FT I LGAVGLKYRT FAFGALAVAT Y 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 19-1 .pep TT LT YT PET YWLTN P FM I LCGTVLY ST AILLFQIVLPHRPVQES VAN AYD ALGGYLEAKA 
M | I I i I I I M I M I 11 11 I I I I I I i M I : I I M : I I M I I M I i I I I i : I 1 I M I I I M 
orf 19nq-l TTLTYTPETYWLTNPFMILCGTVLYSTAI ILFQI ILPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 9-1 . pep DFFDPDEAAW I GNRH I DLAMSNTGV I TAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

I | | | | I M M I I I I I I I I I I I I M I I I I I II I I I I I II I I II I I I I I II I M I I II I I II 
orfl9ng-l DFFDPDEAAW I GNRH I DLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 . 210 220 230 240 



orf 19-1 -pep 



250 260 270 280 290 300 

DIHER.ISSAHVDYQEMSEKFKNTDI I FRIHRLLEMQGQACRNTAQALRAS KDYVYSKRLG 
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M I 11 M I I I M I I II H I I I I ! 1 I I I I I : I I I H I M I I I I II I I : I :: M I II 

orfl9ng-l DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19-1 pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I I I I M I I I I I I I I I : I II II I II I I I I I II I i I I I I i I I I : 1 : I I II I I I II I I I 
orf 19nq-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 19-1 pep ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
I I I | : I : I I I I I I 1 I I I I I I I I II I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I 
orfl9ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
15 370 380 390 400 410 420 

430 440 450 460 470 480 

or f 1 9- 1 . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
M I I I I I I I I I I M I I M i I I I I I II II I I I I M I I I I I I I I I I : I I I I M I I I I I I I I 
20 orfl9ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19-1 .pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
25 | | I I | I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I II I I M I II I I I I I I I I I I I 

orf 19na-l STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

4 90 500 510 520 530 540 

550 560 570 580 590 600 

30 orf 19-1 .pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

M II : I : I I : I I : i I I I : I II I II : I I I II I I I I M I I II I I I I I i I II I I I I 1 I I M I 
orf 19nc-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 590 600 

35 610 620 630 640 650 660 

orf 19-1 . Dep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I M I I II II M I II I II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : MM 
orf 19ng-I PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 

610 620 630 640 650 660 

40 

670 680 690 700 710 

orf 19-1 .peo QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I | | I M I I I I I I I I I : II II M I I I I I t I II I I I I I I I I I I I I I I I I I I I I I I II I 
orf 19ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
45 670 680 690 700 710 

In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp I 033369 I YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl I PIDI el 154 4 38 
(AJ002423) hypothetical protein [Neisseria gonorrhj Length = 417 
50 Score - 1512* (705.6 bits), Expect - 5.3e-203, P = 5.3e-203 

Identities = 301/326 (92%), Positives = 306/326 (93%) 

RQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 366 
RQSLRLLSDGNDS DIRKLSRLLDN LGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 
RQSLRLLSDGNDSXDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 60 

FKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFVCQPNYT 42 6 
FKNTWQAIRPQLNLES VFRHAVRLSLWAAACTIVEALNLNLGYWILLT LFVCQPNYT 
FKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTRLFVCQPNYT 120 

ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 486 
ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 
ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 180 

IQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 54 6 
IQALTSLS LAGLDVYAAMPVRI I DT 1 1 GAS LAWAAVSYLWPDWKYLTLERTAALAVCS SG 





Query : 
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55 


Sbjct : 


1 




Query: 


367 




Sbjct : 


61 
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Query: 


427 




Sbjct: 


121 
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Query: 


487 




Sbjct : 


181 
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54 7 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG ++ + +P 
301 KPATALTGY I SALGHTAAKCTKNAAP 32 6 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 14 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
113>: 



15 



20 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAATATGC 

GCGCGTTTTG 

CGGGTATGGC 

CTTCGCCGCG 

TTTGGCGGAA 

TCCGCCATGT 

CTGGGCATAC 

TTGCCCAAGA 

ACGTTTCCTT 

ACTCA^TTCT 

TGAACGTGTC 

CCGCCCGTTA 

ACTCGrmTTC 

CCAA^CtGAG 

GCGCCTGCgA 

CACGATTTTC 

ACGCCGACCG 

GGTACGATTT 

GGaACAGTTT 

TGACGCTGCC 

GCGACGCTGT 

GCAACACGCG 

TTAAAGTGTT 

GTCAAAATCG 

CTTTAyCGGC 

GGCGCGTGTA 

TATTTACCAA 

GcTCTCGCTC 



TGGGAGCTTT 
GGATTTGTGC 
GACGGATGCG 
TGTTTGCGGA 
TACAAGGAAA 
GGCGGGGATG 
TTGCCGCGCC 
TGCCGACAAA 
ATATATTATT 
TATCATAAGT 
GTTTATCGTA 
CCGCGCyGGC 
CAACTGCCCT 
TTTCAAAGAT 
TTTTgGGCGT 
GCGTCTTATC 
CATGATGGAG 
TGCTGCCGAC 
TCCGCCCTGC 
GGCGgcGGTC 
TTATGTACCG 
CTGATTGCCT 
GGCACCCGGC 
CCATCTTCAC 
CCACTrrAAC 
TCAATGCCGG 
CCTGG . CAAG 
GCCGTGA 



GGCAAAAGTC 
GCGATACGGT 
TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCAAA 
CTGTCGTTTG 
TTGGGTGATT 
TTTCAGCTCT 
GATTTCCCTG 
TCGGCATTCC 
TTCGCGCTGT 
GTGGGCGGTC 
GGCTGGCGAA 
GCGGCGGTCA 
GAgCGTGGCG 
TGCAATCGGG 
CTGCCCAGCG 
TTTGTCCAAA 
TCGACTGGGG 
GGACTGGCGG 
CGwATTTACG 
ATTCTTTCGG 
TTCTATGCGC 
GCTCATCTGC 
rCasTCGGAC 
ATTGTTGTTT 
GGTTGGGCAG 



GGCAGCCTGA 
CATTGCGCGG 
CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGG . C 
TACTGGTTAT 
TATGTTTCCG 
CCATCGATTT 
TCTTCATTTG 
GGCGTTTACG 
TTTTCGTGCC 
TTTGTCGGCG 
ACTGGGCTTT 
ACCGCGTGAT 
CAGGTTTCTT 
CAGCGTTTCA 
GCGTGCTGGG 
CACTCGGCAA 
TTTGCGCCTG 
TGTTGTCGTT 
CTGTTTGACG 
TTTAATCGGC 
GGCAAAACAT 
mCGCAGTTGA 
TTTCGCTTGC 
TACCTGTTGC 
CGTTCTT . AG 



CGATGGTGTC 
GCATTCGGCG 
GCCCAACCTG 
TTGTGCCGAT 
GAAGCCTTTA 
CGTTACCGCG 
CACCCQAGTT 
GCTGCGGATT 
TCGGCTCGGT 
CCAC . GTTTC 
GTATTTCGAT 
GCATTTTGCA 
TTGAAACTGC 
GAAACAGATG 
TGGTGATCAA 
TGGATGTATT 
GGCGGCACTC 
ACCaAGATAC 
TGCATGCtgc 
cCCgCtGGTG 
CGCAGATGAC 
TTAATCATGA 
CAAwAmGCCC 
TGAACCTTGs 
CATCGGTCTG 
GCAGACACGG 
CAAAAATGCT 



This corresponds to the amino acid sequence <SEQ ID 1 14; ORF20>: 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNMLGALAKV 
LRRVFAEGAF 
LGILAAPWVI 
LNSYKKFGIP 
LXFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QHALIAYSFG 
FXGPLXXIGL 
SRSP+ . 



GSLTMVSRVL 
AQAFVP1LAE 
YVSAPSFAQD 
AFTPXFLNVS 
LGFLKLPKLS 
SVSWMYYADR 
LRLCMLLTLP 
LIGLIMIKVL 
SLAIGLGACI 



GFVRDTVIAR 
YKETRSKEAX 
ADKFQLSIDL 
FIVFALFFVP 
FKDAAVNRVM 
MMELPSGVLG 
AAVGLAVLSF 
APGFYARQNI 
NAGLLFYLLR 



AFGAGMATDA 
EAFIRHVAGM 
LRITFPYILL 
YFDPPVTAXA 
KQMAPAILGV 
AALGTILLPT 
PLVATLFMYR 
XXPVKIAIFT 
RHGIYQPXQG 



FFVAFKLPNL 
LSFVLVIVTA 
ISLSSFVGSV 
WAVFVGGILQ 
SVAQVSLVIN 
LSKHSANQDT 
XFTLFDAQMT 
LICXQLMNLX 
LGSVLXQKCC 
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These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 
51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-116- 
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101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA T AC AAG G AAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

5 301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

4 51 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

10 551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

7 51 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

15 801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GC AG ATGACG 

20 1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT T AAT CAT GAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTT AT CGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AAT GCCGG AT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

25 1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

13 51 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

14 01 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 
14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 
1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

30 This corresponds to the amino acid sequence <SEQ ID 1 16; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMAT DA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFT PT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

35 201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHA LIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAI FT LICTQLMNLA 

401 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

40 4 51 SLAVMCGGLW AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 

501 GFRPRH FKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor of S. tvvhimnrium (accession number P37169) 

ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

45 Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMAT DAFFVAFKLPNL LRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMAT DAFFVAFKLPNLLRRIFAEGAF 7 3 

AQAFVPI1AEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 
50 +QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
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60 



Orf20 


1 


MviN 


14 


Orf20 


61 


MviN 


74 


Orf20 


121 


MviN 


134 


Orf20 


181 


MviN 


194 


Orf20 


241 


MviN 


254 



ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
ADKFALTTQLLRITFPYI LLI SLASLVGAI LNTWNRFS I PAFAPTFL.N I SMIGFALFAAP 

YFDPPVTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP.-r^+F+D. RV+KQM PAILGV 
Y FN P P VLALAWAVT VGG VLQLVYQL P YLKKI GMLVL PR IN FRDTG AMR WKQMG PAI LGV 

SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 
SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGT I LLP+LSK A+ + 
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Orf20 


301 


MviN 


314 


Orf20 


361 


MviN 


374 


Orf20 


421 


MviN 


434 
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-117- 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 3 60 

+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ AXIAYS G 
DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXXXXXXXXXXXXXXXXXCI 420 

LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F C+ 
LIGLIVVKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

NAGLLFYLLRRHG I YQPXQG 4 40 
10 NA LL++ LR+ 1+ P G 

MviN 434 N ASLLYWQLRKQN I FT PQPG 453 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A of N. 
15 meningitidis: 

10 20 30 40 50 60 

orf20 pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
| | ! | | M : I I I I I I I I I II I I II I I M I I I I I I I I I I I I I I 1 I M 11 I I I I I I I I I I II I 
or f 20a MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

20 - iQ 20 30 4Q 5Q 60 

70 80 90 100 110 120 

orf20 pep AOAFVPTIAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAQD 
I I I i I I I M I 1 I I I I I M I : I I I I M I I i M I I I I I II I I I I I I I I I I I 1 I I I I I : I I : I 
25 or f 2 0a AOAFVPIIAEYKETRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPGFAKD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf20 pep ADKFOLS IDLLRIT FPYILLI SLSSFVGSVL NSYHKFGI PAFTPX FLNVSFIVFALFFVP 
30 M I I I I I I I I I I I I M I I M I I I I I I I M I I I I I I I I : I I I I I I : I I I I II I I I I II I I I 

or f 20a ADKFOLS IDLLRIT FPYILLI SLSSFVGSVL NSYHKFSI PAFTPT FLNVSFI VFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

35 orf20 pep YFDPP VTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

| j I I I I I 1 | i | I I I I I I I I I I I I I i I I I II I I M I I I Ml I I M I I M I II I I I 11 I 1 
or f 20a YFDP P VTALAWAVFVGG I LQLG FQL PWLAKLG FLKLPKLS FKDAAVNRVMKQ MAPAI LGV 

190 200 210 220 230 240 

40 250 260 270 280 290 300 

orf 20 . pep SVAQVSLVI NTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

TTTTTTTTT I I M I I I I I M 1 1 1 M I 1 1 1 1 M I I I : I I I M 1 1 1 1 M i I l l I l l M I 1 1 I 

or f20a SVAQISLVI NTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

45 

310 320 330 340 350 360 

orf20 pep EOFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQHA LIAYSFG 
| j | | I | I I } I 1 ! I I I I I M 1 I I I : I I ! I M 11 I II M I I I I I I I I I I I II I I I I I M I 
o r f 2 0 a EQFS ALLDWG LR XCMLLTLPAAVGMAVLS FPLVATLFMYRE FT L F DAQMT QHA L I AY S FG 

50 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 20 . pep LIGLIMIKVL APGFYARQNIXXPVK IAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 
I | M I I I I I I I I I I I I 1 1 I I : I I II I I I I I I I : M I M I III : I I I M I I I I I! 1 
55 orf 20a LIGLIMIKVL APGFYARQNIKTPVK IAI FTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 

370 380 • 390 400 410 420 

430 440 450 

orf 20 . peD NAGLLFYL LRRHG I YQPXQG LGSVLXQKCCSRSPX 
60 I I I I I I I I I I I I I I I I I : I :: I : 

o r f 2 0 a NAGLLFYL LRRHG I YQPGKGW AAFLAKMLLS LAVMGGGL YAAQ I W L P FDW AHAGGMQKAA 

430 440 450 460 470 480 

The complete length ORF20a nucleotide sequence <SEQ ID 117> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
65 51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 

101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
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151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCATGT 
CTGGGCATAC 
TGCCAAAGAT 
CGTTTCCTTA 
CTCAATTCCT 
GAACGTGTCG 
CTCCCGTTAC 
CTCGGCTTCC 
CAAACTGAGT 
CGCCTGCGAT 
ACGATTTTCG 
CGCCGACCGC 
GTACGATTTT 
GAACAGTTTT 
GACGCTGCCG 
CAACCTTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGCC 
CGCGTGTATC 
TTTACCAACC 
TCGCTCGCCG 
GTTCGACTGG 
TCCTGATTGC 
GGCTTCCGTC 



TGTTTGCGGA 
TATAAGGAAA 
GGCGGGGATG 
TTGCCGCGCC 
GCCGACAAAT 
TATCTTATTG 
AT CAT AAATT 
TTTATCGTAT 
CGCGCTGGCT 
AACTGCCCTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
ATGATGGAAC 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGTCG 
TATGTACCGA 
TGATTGCCTA 
GCGCCCGGCT 
CATCTTCACG 
CACTGAAACA 
AATGCCGGAT 
TGGCAAGGGT 
TGATGGGAGG 
GCACACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



GGGGGCGTTT 
CGCGTTCTAA 
CTGTCGTTTG 
TTGGGTGATT 
TTCAGCTCTC 
ATTTCACTTT 
CAGCATTCCT 
TCGCGCTGTT 
TGGGCGGTTT 
GCTGGCGAAA 
CGGCGGTCAA 
AGCGTGGCGC 
GCAATCGGGC 
TGCCCGGCGG 
TTGTCCAAAC 
CGACTGGGGT 
GAATGGCGGT 
GAATTCACGC 
TTCTTTCGGT 
TTTATGCGCG 
CTCATTTGCA 
CGTCGGACTT 
TGTTGTTTTA 
TGGGCAGCGT 
CGGCCTGTAT 
GCGGAATGCA 
GGACTGTATT 
CAAACGCGTG 



GCCCAAGCGT 
AGAGGCGACG 
TACTGGTCAT 
TATGTTTCCG 
TATCGATTTG 
CCTCTTTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 
CTGGGTTTTT 
CCGCGTGATG 
AGATTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACTCGGCAAA 
TTGCGCNTGT 
GTTGTCGTTC 
TGTTTGACGC 
TTAATCGGTT 
GCAAAACATC 
CGCAGTTGAT 
TCGCTTGCCA 
CCTGTTGCGC 
TCTTGGCAAA 
GCCGCCCAAA 
AAAGGCCGCC 
TCGCATCACT 
GAAAGCTGA 



TTGTGCCGAT 
GAGGCTTTTA 
CGTTACCGCG 
CACCCGGTTT 
CTGCGGATTA 
CGGCTCGGTA 
CCACGTTCCT 
TATTTCGATC 
CATTTTGCAA 
TGAAACTGCC 
AAACAGATGG 
GGTGATCAAC 
GG AT GT ATT A 
GCGGCACTCG 
CCAAGATACG 
GCATGCTGCT 
CCGCTGGTGG 
GCAGATGACG 
TAATCATGAT 
AAAACGCCCG 
GAACCTTGCC 
TCGGTCTGGG 
AGACACGGTA 
AATGCTGCTC 
TCTGGCTGCC 
CGGCTCTTCA 
GGCGGCTTTG 



This encodes a protein having amino acid sequence <SEQ ID 1 18>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

LGILAAPWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

LNSYHKFSIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

EQFSALLDWG LR XCMLLTLP AAVGMAVLS F PLVATLFMYR EFTLFDAQMT 

QHA LIAYSFG LIGLIMIKVL APGFYARQNI KTPVK IAIFT LICTQLMNLA 

S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

AHAGGMQKAA R LFILIAVGG GLYFASLA AL 



FIGPLKHVGL 
SLAVMGGGL Y 
GFRPRHFKRV 



AAQIWLPFDW 
ES* 



ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 



10 20 30 40 50 60 

orf 20a . pep MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I | I I | M : M M I II I I I I M I I I I I I M I II M I I I I I I I I I II I I I I I II I I I I I I I I 
orf 20-1 ^MLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 20a . pep AQAFVP I LAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALG I LAAPWV I YVSAPGFAKD 

I I I I I I I I I I I I I I I I I I I : I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I 11 I I I : I 
orf 20-1 AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 20a. pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFSIPAFTPTFLNVSFIVFALFFVP 
I I I I I I I I I I.I II I I I I I M II I I I I II II I I I I I I I : I I I I I I I I I I I I I I I I I II I 1 I 
orf 20-1 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGI PAFTPTFLNVS FIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 20a . pep Y FD PPVT ALAWAV FVGG I LQLG FQL PWLAKLGFLKLPKLS FKDAAVNRVMKQMAPAI LG V 

I || I I it I M I I I II I I I I I I I I II I I I II I I I I I I I I I I I I I It I M I I 1 I I I I I M I I 
orf 20-1 YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 

190 200 210 220 '230 240 

250 260 270 280 290 300 

orf 20a. pep SVAQI SLVINT I FAS YLQSGS VSWMYYADRMMELPGGVLGAALGT I LLPT LSKHSANQDT 
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10 



15 



20 



25 



I | | | : | | | | M | M I I I I I I II II M I I ! I I I I I I : I I I M M I I I I M I I I I M I I II I 
orf20-l SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf20a pep EQFSALLDWGLRXCMLLTLPAAVGMAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

Mi MM I I I I I II I I I I : I I I I I M I I I I I I I I I I M I I I I I I I I I I I 1 I I I I 

orf20-l EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf20a pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I | | | | I | I M M II I II I I I II I M II M I I I I II I M M M I M M I I I II M I M II I 
or f 2 0-1 LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf20a pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMGGGLYAAQIWLPFDWAHAGGMQKAA 
M M M M M M II I I I I I I M I I I I M I I I II I I MMIII : I I I : I I I II I I : II : 
orf20-l NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf20a .oep RLFI LI AVGGGLY FAS LAALGFRPRH FKRVESX 

: ! t I I I I M M I M II I I M I II M I I M I M 
or f 20-1 QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 



Homology with a predicted ORF from N. gonorrhoeae 
30 ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 



35 



40 



45 



50 



55 



60 



gonorrhoeae: 

orf 20 . pep 
orf20ng 
orf20 .pep 
orf20ng 
orf 20 .pep 
orf20ng 
orf 20 . pep 
orf 20ng 
orf 20 .pep 
orf 20ng 
orf 20 . pep 
orf 20ng 
orf20.pep 
orf20ng 
orf 20 . pep 



MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 60 

| M |[ I I I I I I M M 1 I I I I I I I II I I M I II I I I I I I I I I I M I I I M M I I II I I I I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

AQAFVF I LAE YKETRSKEAXEAFI RHVAGMLS FVLVI VTALG I LAAPWVI YVS APS FAQ 2 120 
| | M I M M M I M M I I I : I M I I I I I I II I I I I : M II II II I I I I I M II I I : I : : I 
AQAFVPILAEYKETRSKEATEAFI RHVAGMLS FVLIVVTALG I LAAPWVI YVS APGFTKD 120 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

| | | | | M I : I I I I II I I II I I I I I I M I I : I I M I I I I M 11 M : II I : I I M I I I I M I 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 
| M 1 | M | | | | | M I I I I I I I M ! I I I! I 1 M II I I I M I M I I M II II I M M II I 
Y FD P P VTALAW AV FVGG I LQLG FQL PWLAKLG FLKL PKLN FKD AAVNR VMKQMAPAI LG V 24 0 

SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

| | | | : | | | | I | I M II I I I 1 II I I M M I I I I I I I :\ I I I I I I I M I I I II II I I M I I I 
SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 300 

EQFSALLDWGLRLCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQHALI AYS FG 360 
I I I I I I 1 M I I M I II I I I II 1 : I I I I I I I I I II I M II I I I II M I! I II I I II I M I 
EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 

LIGLIMIKVLAPGFYARQNIXXPVKIAI FTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 4 20 
I M I M I I I I 1 M M I I I I : I M I II II I I I M II M I III I II M I I I I I I I 
LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 4 20 

NAGLLFYLLRRHGIYQPXQGLGSVLXQKCCSRSP 4 54 
I I I I ! i : I : I : M I I : I III!: M I I I I I I 
NAGLLFFLFRKHG I YRPGQGLGQPS WRKCCSRS P 454 



orf 20ng NAGLLFFLFRKHG I YRPGQGLGQPS WRKCCSRS P 4 54 

An ORF20ng nucleotide sequence <SEQ ID 1 1 9> was predicted to encode a protein having 

_ - J ^OTT/^l TTA 1 OAv. 



amino 



acid sequence <SEQ ID 120>: 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 
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1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAGM LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRITFPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQMAPAILGV SVAQISLVIN 

251 TIFASYLQSG SVS WMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAAGLAVLSF PLVATLFMYR EFTLFDAQMT 

3 51 QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

4 01 FIGPLKHAGL SLAIGLGACI NAGLLFFLFR KHGIYRPGQG LGQPSWRKCC 
451 SRSP* 

Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 



1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtCCg CgcccGGCTT 

3 51 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 
4 51 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 
501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 
551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 
601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 
651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 
701 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT* GgttATCAAC 
7 51 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 
801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 
851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 
901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 
951 GhCGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

10 51 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

12 01 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 
1251 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 
1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

13 51 GCGCTCGCCG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

14 01 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 
14 51 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 
1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 

151 LNSYHKFGIP AFTPT FLNIS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 

3 51 QHA LIAYSFG LIGLIMIKVL ASGFYARQNI KTPVK IAIFT LICTQLMNLA 

4 01 FIGPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG W AAFLAKMLL 
4 51 ALAVMCGGL W AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLA AL 
501 GFRPRHFKRV ES* " 

ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 

10 20 30 40 50 60 

orf 20-1 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 
I | | I I I I I I I I I I I 1 I I I M I I I I I I II I I I I I I I I I I I I I I I 11 I M I I I 1 II I I I 1 I I 
orf20ng-l MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMAT DAFFVAFKLPNLLRRVFAEGAF 

10 20 . 30 40 50 60 



70 80 90 100 110 120 

orf 20-1 . peD AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 
I I I I I I I I I I I I I I i I I I I : I I M I I I II I I M I I : I I II I I I I I M I I I I I I 1 I I : : I 
orf20ng-l AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 
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70 



8C 



90 



100 



110 



120 



10 



15 



20 



25 



30 



35 



130 140 150 160 170 180 

orf20-l oeD ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 
* F | | ! I | I |(: I M I I M I M M I I I M I I !: I i I M I I M M M M I I I : I I I I I I M I M 

orf20na-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 
9 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 20-1 pep yfdppvtalawavfvggilqlgfqlpwlaklgflklpklsfkdaavnrvmkqmapailgv 
I M I I M I I f M M I I M I I I I 1 I II I i I I 1 M I I I! I i : M I I I M I I I I I i I 1 I I I M 
orf20nq-l YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 20-1 Dep SVAQVSLVINTI FASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

I I I I : I I I M I I I I I II I I M I M It I I M I I I I M I I I M M i I I I I I I II I 

orf20na-l S VAQI SLVINTI FAS YLQSGS VSWMY YADRMMELRRGVLGAALGT I LLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf20-l pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I I M I I I I I I I II I I M I I I I : I M I I t I M I I I I I I I I I I I I I M M I I I I I i I I I I I 
o-f20ng-i EQFSALLDWGLRLCMLLTLPAAAGIAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 • 410 420 

O*-f20-l peo LIGLIMIKVLAPGFYARQNIKT PVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

I M I I I I 1 I I I | | I I I I I I I I I I I I I I M M I I II I I I I I i I I I I I : I I II I II I I I I I 
or^20nc-^ LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f20-^ pep NAGLLF YLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
I | ! M I : I I 1 : I I M : I i : M 1 I I M I I I I : I I M I I I I M I I I I I I M I II I I I I I I I 
orf20ng-l NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 

430 440 450 460 470 480 



40 



45 



50 



55 



60 



65 



70 



490 500 510 

orf 20-1 . DeD QLCILIAVGGGLYFASLAALG FRPRHFKRVENX 
I I I I I I II I II I 1 I I I I M I I I I I I I I I I I I : I 
orf20nc-l Q LC I L I AVGGGL Y FAS LAALG FR PRH FKRVE S X 

490 500 510 

In addition, ORF20ng-l shows significant homology with a virulence factor of S. typhimurium: 

spi P37169 IMVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi|438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl I PIDI 61005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 

Score = 1573 (750.1 bits), Expect « l.le-220, Sum P(2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 

Query: 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMAT DAFFVAFKLPNLLRR+FAEGAF 
Sbjct: 14 Ml^JLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Query: 61 AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 

+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 
Sbjct: 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

Query: 121 ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 
Sbjct: 134 ADKFALTTQLLRITFPYILLISLAS LVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Query: 181 YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 240 

YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 

Sbjct: 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMGPAILGV 253 

Query: 241 SVAQI SLVINTI FAS YLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 300 

SV+QISL+INTIFAS + L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
Sbjct: 254 SVSQI SLI INT I FAS FLASG S VSWMYYADRLME FPSGVLGVALGT I LLPS LSKS FASGNH 313 



BNSDOCID: <WO 9924576A2J_ 
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15 



Query: 


301 


Sbjct : 


314 


Query: 


361 


Sbjct : 


374 


Query: 


421 


Sbjct: 


434 


Score 


= 70 


Identities ; 


Query: 


469 


Sbjct: 


481 



+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 

LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 
LIGLI++KVLA GFY+RQ+ IKTPVKI AI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 

NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 
10 " " NA LL++ LRK 1+ P GW VM L+ +P 



33.4 bits), Expect = l.le-220, Sum P(2) = l.le-220 
14/41 (34%), Positives = 23/41 (56%) 



EW+ + + +L ++ G YFA+LA LGF+ + F R 



20 Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 15 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 123>: 

25 1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

2 01 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

30 251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

2 51 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

4 51 GTCAATGCGA tGGACACCAA TCCG.. 

35 This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV .YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNP. . 

40 Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

45 201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401' GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

50 4 51 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501. CAAAGAAGCC G C CG AG GAT T TCAAACGCGG CCTGTTGGTA .TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

55 701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

7 51 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



GGGACGCTAC 
AGCTGTTCGG 
ACAACCCTCG 
CGTCAACGGC 
TGATGCCCTT 
GGCGATACCG 
AGACCTCGCT 
CGCTGTTGCG 



CACAATCAGA 
CTGGGTTGCG 
GCCATTTCCT 
GGCGACCGCG 
GGATATCCTG 
ACAGCGCGCA 
TTGTGCAGCT 
CAAAGTGCTG 



TTTCCGTTAT 
CCGCAGCCGG 
GAAAAACAAA 
CCATGGTGCC 
CCCACCCTGC 
GGCATTGGGT 
TCGTCTGCCC 
GAAACCATTG 



CGAAGAAGGC 
ACAAATACTC 
CTCTTCAAGT 
GATTGGTACT 
TTTTGCGCGA 
TGCTTGGAAT 
GGGCAAATAC 
AGAAGGAAGG 



CGCAGCAAAG 
CATCACGCGT 
TCAACACAGC 
TACGAGCGCG 
TTTAATCGTC 
TGGACGAAGA 
GAATACGGCC 
CTGA 



This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 



10 



15 



i 

51 
101 
151 

201 
251 
301 
351 
401 



MIKIKKGLNL 
VKKGQVLFED 
EFERYAPEAL 
VNAMDTNPLA 
SENAANIETH 
LFATGRLNTE 
SGSVLNGAIT 
TTLGKFLKNK 
GDTDSAQALG 



PIAGRPEQAV 
KKNPGWFTA 
ANLSGEEVRR 
ADPTVIIKEA 
EFGGPHPAGL 
RVIALGGSQV 
QGAHDYLGRY 
LFKFNTAVNG 
CLELDEEDLA 



YDGPAITEVA 
PASGKIAAIH 
NLIQSGLWTA 
AEDFKRGLLV 
SGTHIHFIEP 
NKPRLLRTVL 
HNQISVIEEG 
GDRAMVPIGT 
LCSFVCPGKY 



LLGEEYAGMR 
RGEKRVLQSV 
LRTRPFSKIP 
LSRLTERKIH 
VGANKTVWTI 
GAKVSQITAG 
RSKELFGWVA 
YERVMPLDIL 
EYGPLLRKVL 



PSMKVKEGDA 
VIAVEGNDEI 
AVDAEPFAIF 
VCKAAGADVP 
NYQDVITIGR 
ELVDTDNRVI 
PQPDKYSITR 
PTLLLRDLIV 
ETIEKEG* 



Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 127>: 



20 



25 



30 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGATTAAAA 
GCAAGTCATT 
AAGAATATGC 
GTCAAAAAAG 
GTTTACCGCG 
AGCGCGTACT 
GAGTTCGAAC 
ANTNNGNNGC 
GTCCGTTCAG 
GTCAATGCGA 
CAAAGAAGCC 
TGACCGAGCG 
TCTGAAAATG 
GGCCGGTTTG 
ACAAAACCGT 
TTGTTTGCAA 
TTCTCAAGTC 
TATCGCAAAT 
TCCGGTTCGG 
GGGACGCTAC 
AGCTGTTCGG 
ACGACCCTCG 
CGTCAACGGT 
TAATGCCGCT 
GGCGATACCG 
AGACCTCGCT 
CGCTGTTGCG 



TCAAAAAAGG 
TATGACGGGC 
CGGTATGCGC 
GCCAAGTGCT 
CCNGTTTCAG 
TCAGTCGGTC 
GCTACGCGCC 
AATCTGATCC 
CAAAATCCCT 
TGGACACCAA 
GNCGANGATT 
TAAAATCCAT 
CTGCCAACAT 
AGTGGCACGC 
TTGGACCATC 
CAGGCCGTCT 
AACAAACCAC 
TACTGCGGGC 
TATTGAACGG 
CACAATCAGA 
CTGGGTTGCG 
GCCATTTCCT 
GGCGACCGCG 
AGACATCCTG 
ACAGCGCGCA 
TTGTGCAGCT 
TAAGGTGCTG 



TCTAAACCTG 
CCGTCATTAC 
CCCTNGATGA 
GTTTGAAGAC 
GCAAAATCGC 
GTGATTGCCG 
CGAAGCGTTG 
AATCCGGTTT 
GCCGTCGATG 
TCCGCTNGCG 
TCAGACGANG 
GTGTGTAAGG 
CGAAACACAT 
ACATTCATTT 
AATTATCAAG 
GAACACCGAG 
GCCTCTTGCG 
GAATTGGTTG 
CGCGATTACA 
TTTCCGTTAT 
CCGCAGCCGG 
GAAAAACAAA 
CCATGGTGCC 
CCTACCCTGC 
AGCATTGGGT 
TCGTCTGCCC 
GAAACCNTTG 



CCCATCGCGG 
CGAAGTCGCG 
AAGTCAAGGA 
AAAAAGNATC 
CGCCATCCAT 
TTGAAGGCAA 
GCAAACTTAA 
GTGGACTGCG 
CCGAGCCGTT 
GCAGACCCTG 
TNTGCTGGTA 
CAGCTGGCGC 
GAATTCGGCG 
CATTGAGCCG 
ATGTAATTGC 
CGCGTGATTG 
TACCGTTTTG 
ACGCAGACAA 
CAAGGCGCGC 
CGAAGAAGGC 
ACAAATACTC 
CTCTTCAAGT 
GATTGGTACT 
TTTTGCGCGA 
TGCTTGGAAT 
GGGCAAATAC 
AGAAGGAAGG 



GCAGACCGGA 
TTGCTTGGCG 
AGGCGATGCC 
CGGGCGTGGT 
CGCGGCGAAA 
CGACGAAATC 
GCGGCGANGA 
CTGCGTANCC 
CGCCATCTTC 
TGGTTGTGAT 
TTGAGCCGTT 
AGACGTGCCG 
GCCCGCATCC 
GTCGGTGCAA 
CATCGGACGT 
CTTTGGGTGG 
GGTGCGAAAG 
CCGCGTGATT 
ACGATTATTT 
CGCAGCAAAG 
CATCACGCGT 
TCACGACAGC 
TACGAGCGCG 
TTTAATCGTC 
TGGACGAAGA 
GAATANGGCC 
CTGA 



This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MIKIKKGLNL 
VKKGQVLFED 
EFERYAPEAL 
VNAMDTNPLA 
SENAANIETH 
LFATGRLNTE 
SGSVLNGAIT 
TTLGHFLKNK 
GDTDSAQALG 



PIAGRPEQVI 
KKXPGWFTA 
ANLSGXEXXX 
ADPVWIKEA 
EFGGPHPAGL 
RVIALGGSQV 
QGAHDYLGRY 
LFKFTTAVNG 
CLELDEEDLA 



YDGPVITEVA 
PVSGKIAAIH 
NLIQSGLWTA 
XXDFRRXXLV 
SGTHIHFIEP 
NKPRLLRTVL 
HNQISVIEEG 
GDRAMVPIGT 
LCSFVCPGKY 



LLGEEYAGMR 
RGEKRVLQSV 
LRXRPFSKIP 
LSRLTERKIH 
VGANKTVWTI 
GAKVSQITAG 
RSKELFGWVA 
YERVMPLDIL 
EXGPLLRKVL 



PXMKVKEG DA 
VIAVEGNDEI 
AVDAEPFAIF 
VCKAAGADVP 
NYQDVIAIGR 
ELVDADNRVI 
PQPDKYSITR 
PTLLLRDLIV 
ETXEKEG* 



60 



The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 

10 20 30 40 50 60 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDA VKKGQVLFED 
I I I I I II I I I I I I I I I I I : : I I I I : I II I II I M I I I I I 1 I M I I i I I I I I I | | | |l I I 
orf 22a MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

I I I I I I i I I I : ! I II I I I I I II I I I I M I I i I I I I I I I I M I II I I! I I I I I I I 
orf 22a KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

70 80 90 100 110 120 



130 140 150 

orf 22 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 
I I II I i I I M I I : I I I II M I II M I I I I I M I I II I I 
orf 22a NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 

10 20 30 40- 50 60 

orf 22a . pep M I K I KKGLN L P I AGR PEQV I Y DG P V I TE VALLGEE YAGMR PXMKVKEG D AVKKGQVL FE D 

I I I II i I I I I I I I II I I I : : II I I : I I I I II II I M I I M I I I I I I I I I I I I I I I I I I I 
orf 22-1 M I K I KKG LN L P I AG R P E Q A V Y DG PA I T E VAL LG E E Y AG MR P S MKVKE G D A VKKG Q V L FE D 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 22a . pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

M I I I I I I I I : I I I I M I I I I I I I I I I II I I ! 11 I I I I I I M I I I I I I I I I I t I I 
orf22-l KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 22a . pep NLIQ5GLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
1 II t I II I I I I I : II I I II II II II I I I I I I I I I I I 1 I I I I I I : I : i I 1 I 11:1 I I 
orf 22-1 NLIQSGLWTALRTRPFSKIPAVDAEPFAIEVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 22a . pep LSRLTERKIHVCKAAGADVPSENAANIETHE FGGPHPAGLSGTHIHFIEPVGANKTWTI 

I i I I I I M I I I M I I I I II I 1 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
orf 22-1 LSRLTERKIHVCKAAGADVPSENAANIETHE FGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 22a . pep NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 
I I M I I : I 1 II I I 11 I I I I I! I I I I I 1 I I II I II I I I I I I I I I I I I M I II 1 I! : It I I I 
orf 22-1 NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22a . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
I | | | | M | | | M I M I I I I I M I I I I I 1 I M II I I I I I I I I I II 11 I I I I I 1 I I I I I I I I 
orf 22-1 SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340. 350 360 

370 380 390 400 410 420 

orf 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
M I I : I I I I I I I I I I I I I I I M 1 I I I 1 I I I I I I I I I I I I I I I I 1 1 I I I I I I II I I 11 I I I 
orf22-l LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

. 370 380 390 400 410 420 

430 440 
orf 22a . pep LCS FVCPGKYEXGPLLRKVLETXEKEGX 

I I ! I I! I I I I I I I I i I I I I I 1 Mil! 
orf 22-1 LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

Further work identified a partial gene sequence <SEQ ID 129> from N. gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ED 130; ORF22ng>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 
51 VKKG QVL FED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
101 EFERYVPEAL- AKLSSEKVRR. NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
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151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

5 Further work identified complete gonococcal gene <SEQ ID 13 1>: 



10 



15 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGATTAAAA 
GCAAGTCATT 
AAGAATATGT 
GTCAAAAAAG 
ATTTACTGCG 
AGCGCGTACT 
GAGTTCGAAC 
AGTGCGCCGC 
GTCCGTTCAG 
GTCAATGCGA 
CAAAGAAGCC 
TGACCGAACG 
TCTGAAAATG 
TGCCGGCTTG 
ATAAAACCGT 
TTGTTCGTAA 
CCTGCAAGTC 
TGTCTCAACT 
TCCGGTTCGG 
GGGACGCTAC 
AGCTGTTCGG 
ACCACTCTCG 
CGTCAACGGC 
TAATGCCGTT 
GGCGATACCG 
AGACCTCGCT 
CGCTGTTGCG 



TCAAAAAAGG 
TATGACGGCC 
CGGCATGCGC 
GCCAAGTGCT 
CCGGCTTCAG 
TCAGTCAGTC 
GCTACGTACC 
AACCTGATTC 
CAAAATCCCT 
T G G AC AC C AA 
GCCGAAGACT 
T AAAAT C CAT 
CTGCCAATAT 
AGTGGCACGC 
GTGGACCATC 
CAGGCCGTCT 
AACAAACCGC 
TACCGCCGGC 
TATTGAACGG 
CACAATCAGA 
CTGGGTTGCG 
GCCATTTCCT 
GGCGACCGCG 
GGACATCCTG 
ACAGCGCGCA 
TTGTGCAGCT 
CAAAGTGCTG 



TCTAAATCTG 
CGGCCATTAC 
CCCTCGATGA 
GTTTGAAGAC 
GCAAAATCGC 
GTGATTGCCG 
TGAAGCGCTG 
AATCAGGCTT 
GCCGTAGATG 
TCCGCTGGCT 
TCAAACGCGG 
GTGTGTAAAG 
CGAAACACAT 
ACATTCATTT 
AATTATCAAG 
GAATACCGAG 
GCCTCTTGCG 
GAATTGGTTG 
TGCGATTGCA 
TTTCCGTTAT 
CCGCAGCCGG 
AAAAAACAAA 
CCATGGTACC 
CCTACCTTGC 
GGCTTTGGGT 
TCGTCTGCCC 
GAAACCATTG 



CCCATCGCGG 
CGAAGTCGCG 
AAATCAAGGA 
AAAAAGAATC 
CGCTATTCAC 
TTGAAGGCAA 
GCAAAATTGA 
ATGGACTGCG 
CCGAGCCGTT 
GCCGACCCTA 
CCTGTTGGTA 
CAGCAGGCGC 
GAATTTGGCG 
CATCGAGCCA 
ACGTGATTGC 
CGCGTGGTTG 
TACCGTTTTG 
ACGCGGACAA 
CAAGGCGCGC 
CGAAGAAGGC 
ACAAATACTC 
CTCTTCAAGT 
GATCGGCACT 
TTTTGCGCGA 
TGCTTGGAAT 
GGGCAAATAC 
AGAAGGAAGG 



GCAGACCGGA 
TTGCTTGGCG 
AGGTGAAGCC 
CGGGCGTAGT 
CGTGGCGAAA 
CGACGAAATC 
GCAGCGAAAA 
CTTCGCACCC 
CGCCATCTTC 
CGGTCATCAT 
TTGAGCCGCC 
AGACGTGCCG 
GCCCGCATCC 
GTCGGCGCGA 
TATCGGACGT 
CCTTGGGCGG 
GGTGCGAAGG 
CCGCGTGATT 
ATGATTATTT 
CGCAGCAAAG 
CATCACGCGC 
TCACGACAGC 
TATGAGCGCG 
TTTAATCGTC 
TGGACGAAGA 
GAATACGGCC 
CTGA 



This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l: 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MIKIKKGLNL 
VKKGQVLFED 
EFERYVPEAL 
VNAMDTNPLA 
SENAANIETH 
LFVTGRLNTE 
SGSVLNGAIA 
TTLGHFLKNK 
GDTDSAQALG 



PIAGRPEQVI 
KKNPGWFTA 
AKLSSEKVRR 
ADPTVIIKEA 
EFGGPHPAGL 
RWALGGLQV 
QGAHDYLGRY 
LFKFTTAVNG 
CLELDEEDLA 



YDG PAITEVA 
PASGKIAAIH 
NLIQSGLWTA 
AEDFKRGLLV 
SGTHIHFIEP 
NKPRLLRTVL 
HNQISVIEEG 
GDRAMVPIGT 
LCSFVCPGKY 



LLGEEYVGMR 
RGEKRVLQSV 
LRTRPFSKI? 
LSRLTERKIH 
VGANKTVWTI 
GAKVSQLTAG 
RSKELFGWVA 
YERVMPLDIL 
EYGPLLRKVL 



PSMKIKEGEA 
VIAVEGNDEI 
AVDAEPFAIF 
VCKAAGADVP 
NYQDVIAIGR 
ELVDADNRVI 
PQPDKYSITR 
PTLLLRDLIV 
ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
45 overlap with ORF22ng: 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

I | | | | i | | | I I I M I I I I : : I I I I M I I I M I I I M : M I I I I I : I I I : I II 1 I I I I I I I 
orf22ng MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 60 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

I I I I i I 11 M I I I I I I I i I M I M M I I I 11 I I I I I I 1 I I I I I I : I M I I : II : I : I I I 
orf 22ng KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 

orf22.pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

I I I I 1 I I I I I M M I M II M 1 I I I II II I I 1 M 1 I I I 
orf22ng NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 180 



50 



55 



The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 



60 



orf22-l.pep 



10 20 - 30 40 50 60 

MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
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10 



| M I I I i 11 I I I M i > I I :: I I I I I t I M I I I M 1 I : M I I i I I : I II : II t I I I I M I I 
orf22ng-l MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEWGMRPSMKIKEGEAVKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf22-l pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

I I M M I I I I I I I I I I I I M I M M I I I I I I I I 1 I I I I I : I I I I I : M : I : I I I 

orf22ng-l KKNPGVVFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 22-1 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
I M I! I I II I I I I M I I I II I I I II I I I I I I I I M I I I II I I I I I I I I I I I I I M I I I I I 
orf22ng-l NLIQSGLWTALRTRPFSKI PAVDAE PFAI FVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 

15 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 22-1 .pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
I | | | | M I I I I I M I I I I M I I I I I I I I I 1 1 I II I I M I I I I I I I I I I I I I I I II I I I I I 
20 orf22ng-l LSRLTERKIHVCPCAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 22-1 . pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 
25 * ' I I I I I I : I I I I I : I I II I 11 I I : I I M I I I I I 1 1 II I M I M I II : I I I I I I I : I I I 1 I 

orf22ng-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAK^/SQLTAGELVDADNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

30 orf 22-1 . pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

I | | | I If M : II I I I I I I I I I I I I I M I I I I I I I I M I I I I I I t I \ I II I I I I I II I I I 1 
orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

35 37C 380 390 400 410 420 

orf 22-1. peo LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I 1 : | I I | I I I I I I I I I I II I I I I II M II i i M I I 1 I I I I I I I I I II I I I I I M I I 1 I 
orf 22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 

40 

430 440 
orf 22-1 .peD LCSFYCPGKYEYGPLLRKVLETIEKEGX 
I I I I 1 I M I I I I I I I M I I I M I I I I I I 
orf22ng-l LCSFVCPGKYEYGPLLRKVLETIEKEGX 
45 430 440 

Computer analysis of these sequences gave the following results: 

Homology with 48kDa outer membrane protein of Actinobacillus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf 22 1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 
50 MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 

4 8kDa 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

orf 22 61 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 
KKNPGWFTAPASG + I-rRGEKRVLQSWI VE F RY LA+LS E+V++ 

55 48kDa 61 KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 



60 



orf22 121 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAI PSSIFVNAMDTNP 158 

ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi 1 1185395 (U24 4 92) 4 8 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
Length =44 9 

65 Score = 530 bits (1351), Expect = e-150 
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Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 

Ouery. i MI K I KKGLNLP I AGRPEQVI YDG PVI TEVALLGEE YAGMRPXMKVKEG DAVKKGQVLFE D 60 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
Sbjct : 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

Query 61 KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGWFTAP SG + I +RGEKRVLQS WI VEG+++I F RY LA+LS + 
Sbjct: 61 KKN PG WFTAPASGT WT INRGEKRVLQS WI KVEG DEQI T FTRYEAAQLASLS AEQVKQ 120 

Query 121 NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPWVIKEAXXDFRRXXLV 180 

NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE DF+ V 
Sbjcf 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 



15 Query 181 LSRL— TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 24 0 

Query: 238 WTINYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADN 297 
20 - w +NY q DVIAIG+lf TG L t+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 

sb j ct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 2 98 RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 357 
RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
25 Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 358 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 417 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 418 XXXXXS FVC PGKYEXGPLLRKVLETXEKEG 4 47 
•^+VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A. pleuropneumonias: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein ( Actinobacillus 
35 pleuropneumoniae] Length = 44 9 

Score = 555 bits (1414), Expect = e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 

Query: 2 7 MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 86 
40 MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 

Sbjct: 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

Query: 8 7 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 
KKN PGWFTAP ASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 
45 Sbjct: 61 KKNPGW FT APASGTWT INRGEKRVLQS WIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

Query: 147 NLIQSGLWTALRTRPFSKIPAVDAEPFAI FVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 206 

NLI + SGLWTA RTRPFSK+PA+DA P +1 FVNAMDTNPLAADP V++KE DFK GL V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSS1FVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 

Query: 207 LSRL--TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIKFIEPVGANKTV 263 

L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF+ + PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSN I PLS PAI EG IT I KS FSGVHPAGLVGTK I H FVDPVGATKQV 24 0 

55 Query: 2 64 WTINYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADN 323 

W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRI ISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 324 RVI SGSVLNGAIAQGAHDYLGRYHNQI SVIEEGRSKELFGWVAPQPDKYS ITRTTLGHFL 383 
60 RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 

Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 384 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 4 43 
K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
65 Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDI I PTLLLRDLAAGDTDSAQNLGCLELDEE 419 



70 



Query: 44 4 XXXXXSFVCPGKYEYGPLLRKVLETIEKEG 47 3 

++VCPGK YGP+LR LE IEKEG 
Sbjct: 420 DLALCTYVCPGKNNYGPMLRAALEKIEKEG 44 9 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-128- 



PCT/IB98/01665 



Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumonias, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
5 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

10 Example 16 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC.nACGTC GThGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

15 151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

2 51 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

20 4 01 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

4 51 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

25 651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

7 01 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

7 51 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 
801 GrkCmmmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT w C T AT GAT GT 

8 51 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 
30 901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 . .AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGW FVA LSALLAWSIV PADGILRHPE 

35 101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGAT FLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPIFVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

301 WVFVLGLPVG PGAPTFYPAP * 

40 Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 

1 AT G AGT C AAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

45 201 G ATT T AC ATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

2 51 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

3 01 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 
351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 
50 4 51 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC " GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 
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551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

7 01 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATT GGGCCC TTATCAATCA 

7 51 GATTTGTCAC AAGAAGAAAA AGACATT CGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTGATC AAATACAAAA AAG AT GCGGG CGTGGGTACG CTGATTTCTA 

14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA 

51 VPDPRPVGAK GRADDG LIYI VSLLNADGFI KIL THTVKNF 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMWFTGI 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG 

2 01 QQAAQIIHPD YVVGPEANW F FMVASTFVIA LIGYFV TEKI 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE 

4 01 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV 

4 51 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA 

501 CIWVFVLGLP VGPGAPT FYP AP* 



SAV GAYFGLS 
TGFAPLGTVL 
LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGSVLFI 
IQAAYRIGDS 
FFLIAWIALF 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of 



meningitidis: 



10 20 30 

orfl2 pep AXX1IHPXXWGPEANWFFMVASTFVIALI 

I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQI1HPDYWGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 

40 50 60 70 80 90 

orf 12 .pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
I | | 1 | M I I I I M M M I II I I 1 I I I I I I I I I I I I M I II I ! I II 1 II I I I I I I I! I I I 1 
or f 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 

100 110 120 130 140 150 

orf 12 .pep P ADG I LRH PETGLVSGSP ELKS I WF1FLLFALPG I VYGRVTRSLRGEQEWNAXAE SMS 
| | | | | | i i I I I I I I I I I II II I II I I I I I I I I I II I I I I I I I I M I I I I I I I I I INN 
o r f 1 2 a PADG I LRHPETGLVSGS PFLKS I WFI FLLFALPG I VYGRVTRSLRGEQEWNAMAESMS 

300 310 320 330 340 350 

160 170 180 190 200 210 

orf 12 .pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 
]| | I M M I I I M I i I I M I II I I I I I I II M II I I I M M 11 M II I I I I I II M 
orf 12a TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAT FLKEVGLGGSVLFIGFILICAFINLM 

360 370 380 390 400 410 

220 230 240 250 260 270 

orf 12. pep ' IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
I I I I I | | | IS M I M M M I 1 M I M I I I M M I I I M M I II 11 M I I I M II I II I 



9924S78A2_L> 



WO 99/24578 



-130- 



PCT/IB98/01665 



orfl2a IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 
420 430 440 450 460 470 

280 290 300 310 320 

or f 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
! I M | I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I M I I 11 I I I I I 
orfl2a KKDAGVGTLI SMMLPYS AFFLI AWI ALFC I WVFVLGLPVGPGAPT FYPAPX 

480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



AT GAG TC AAA 
ATGGCTGGGC 
TTGTGTTATT 
GTCCCCGATC 
GATTCACGTT 
CGCATACCGT 
GTTTCTTTAT 
ATTAATGCGC 
TGGTTGTTTT 
GTCGTCCTAA 
TCCGCTTGCC 
CGGCCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



CCGATACGCA 
AATATGTTGC 
GCTGATTGCC 
CGCGCCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTCTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGTAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ACGGGACGGA 
CGCACCCGGT 
TCTGCCGCCG 
TGGTGCGAAA 
TCGATGCTGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCTCC 
TTATCTAATA 
CGCCATCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGA 
TTTGGGCTGG 
ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CCGCTTCTGA 
TTTCATTCCC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AG G CAT C AC C 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CAATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This encodes a protein having amino acid sequence <SEQ ID 138>: 



45 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIHV 



VSLLGVGIAE KSGLISALMR 
WLIPLSAII FHSL GRH PLA 
QQAAQIIHPD YWGPEANWF_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKSIWFIF 



MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LMI GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGAPTFYP 



NMLPHP VTLF IIFIVLLLIA SAA GAY FGLS 
VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

LSNTASELGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGSVLFI 
IQAAYRIGDS 
FFLIAWIALF 



LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMVASTFVIA LIGYFV TEKI 
KGLIW AGWF VAL SAL LAWS 
LLFALPGIVY GRVTRSLRGE 



FNWTNIGQYI AVKGATFLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 
AP* . . 



55 ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 



60 



65 



10 20 30 40 50 60 

orf 12a. pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 
I I I I I I I I I I I I I I I I I I I I ! I I I II I I II II ! I M I I I I I I : I I M II I I I M ! 1 M I I 
orf 12-1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FIVLLL IAS AVGAYFGLS VPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12a . pep GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
It I I I I I I :: M I I : II I : 1 I I M I I 1 I I M I I I I I I II I II M I I I I M I I I I I I I I I 1 
orf 12-1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
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70 



80 90 100 110 120 



orfl2-l 



130 140 150 160 170 180 

orf 12a oep LLLTKSPRKLTTFMVVFTGILSNTASELGYVVLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
orfl2a.pep ^ , , , , | | | || M I II II I I I I I I M M I I I I M I M II I ! I M I I M II I I I I I 

LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPIAGLAAAFAGVS 
130 140 150 160 170 180 

190 200 210 220 230 240 

10 orfl2a oeo GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

10 orfl2a.pep | j | | | | j | | | l | || M M I I I M I I I I I I M M i I I 1 1 I II M I I I I I I I I M M I I I I I 

orf 12-1 GGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMVASTFVIALIGYFVTEKI 
19 0 200 210 220 230 240 

ic 250 260 270 280 290 300 

13 nr-12a oeo v^PQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

or.l2a. P e P | | | | | | | | i | | | || I I I I I I I I M I M M I M I I M I I I I I I I I I II I I I I M I I I I 

nrfl2-i VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
250 260 270 280 290 300 

20 310 320 330 340 350 360 

orfl2a oeo PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

| f | M I I I t 1 M I I I I I I I f I t I I 1 I I I I I 1 I 1 1 I 1 t E 1 I 1 1 1 1 I I I I I 1 t f I I I 

nrf12 _i PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

25 310 320 330 340 350 360 

370 380 390 400 410 420 

nrfl2a oeo 1 17 r AAQ WAFFNWTN I GQY I AVKG AT FLKE VGLGG S VL FI G FI L I C AF INLM I G S AS AQW 
* j u | | | ]| ) | | | | | | I I I I I M I I I I I II I I I M M t I I I I M I I 1 I I i I I I M I M 1 II 
„ f1 ? , t-^aaqitvaFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
JU °' r ~ " 370 380 390 400 410 420 

430 440 450 460 470 480 

o-fl2a oeo A VTA^I WPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

35 ^ | t Ml II I I I l I I I M I II 1 I I I I I I I M I I M I I II I I I I 1 I I I 1 I I I I MM 

J or-12-l AVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT . 

430 440 450 460 470 480 

490 500 510 520 

40 cr-12a oep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

I I I I ] I i I H I I I I I I M I I I M I I I M I II I I I M I I I I II I 
0~*i 2-1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
4 90 500 510 520 

45 Homology with a predicted ORF from N. gonorrhoeae 

ORP12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 

gonorrhoeae: 



orf 12 . pep 



AXXIIHPXXWGPEANWFFMVASTFVIALI 30 
I j | | | I I I I I I I I II I : II I M M I I 

50 orfl2ng aaafaGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 232 

orf 12 Dec gYFV^EKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 90 
I 1 I M M II I I I I M M M M I I I I M I I I I I I I I M I M I I I I I I II I I Ml I I I I I I I 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 292 



55 



orf 12ng 



o-fl2 pep padgILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 150 

I I I M I I M M I 1 I : I I M II I M II I II M I I I M M I I : I M I I I I: M I I I Mill 

orfl2ng PADG I LRH PETGLVAG S P FLKS I WFI FLLFALPGI VYGRI TRS LRGERE WN AMAESMS 352 

60 Q-P12 Pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 210 

" ' H j I I I | | M M I I I M M M II I I M II M : M I : I I I I I M M I I M I I I I I M 

orf 12ng TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 412 

or^l2 peo IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVXXY 270 

65 ~ | | | | | | i | M M M I I I M I I M I I : I M M II I I II I I M M I M M I I I M ! I I I 

orf!2ng IGSAS AQWAVTAPIFVPMLMIAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 
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o-fl2 peD KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 32 0 

M I I I II II i I I I M I I I M II I I II I M M H I I I I M I I : I I I I I : I 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGT PTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 139> is: 



10 



15 



20 



25 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGAGTCAAA 
ATGGCTGGGC 
TTGTGTTATT 
GTCCCCGATC 
GATTCACGTT 
CGCATACCGT 
GTTTCTTTAT 
ATTAATGCGC 
TGGTTGTTTT 
GTCGTCCTAA 
TCCGCTTGCC 
CGG CCAATCT 
CAACAGGCGG 
CAACTGGTTT 
ATTTTGTTAC 
GATTTGTCAC 
TTTGGAATAT 
CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTT 
TGTCGCATTT 
GGGCGGTGTT 
GG TTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTAATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCG 



CCGACGCGCG 
AATATGTTGC 
GCTGATTGCC 
CGCGTCCTGT 
GTCAGCCTGC 
TAAAAATTTC 
TGGGCGTGGG 
TTATTGCTCA 
TACAGGGATT 
TCCCTTTGTC 
GGTTTGGCTG 
GTTCTTAGGC 
CGCAAATCAT 
TTTATGGCAG 
TGAAAAAATC 
AAGAAGAAAA 
AAAGGATTAA 
GGCTTGGAGC 
GATTGGTTGC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGACTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GTGCCTTAA 



TCGTAGCGGA 
CGCACCCGGT 
tctgCCGTCG 
TGGGGCGAAA 
TCGATGCCGA 
ACCGGTTTCG 
GATTGCGGAA 
CAAAATCCCC 
TTATCCAATA 
CGCCGTCATC 
CGGCTTTCGC 
ACAATCGATC 
CCATCCCGAC 
CCAGTACGTT 
GTCGAACCGC 
AGACATTCGG 
TTTGGGCAGG 
ATCGTCCCTG 
CGGTTCGCCG 
CGCTGCCGGG 
CGGGAAGTCG 
TTTGGTCATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGATGAGT 
AAGATGCGGG 
TTCTTCTTAA 
GGGTCTGCCC 



CGATTTTTAC 
TACGCTTTTT 
GTGCGTATTT 
GGACGTGCCG 
CGGTTTGATC 
CGCCGTTGGG 
AAATCGGGCT 
ACGCAAACTC 
CGGCTTCTGA 
TTTCATTCGC 
CGGCGTTTCG 
CGCTCTTGGC 
TACGTCGTAG 
TGTGATTGCT 
AATTGGGCCC 
CATTCCAATG 
CGTGGTGTTT 
CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGTGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTAGGCACG 
TTGCATGGAT 
GTCGGTCCCG 



GCACAGTCGA 
ATTATTTTCA 
CGGACTATCC 
ATGACGGTTT 
AAAATCCTGA 
AACGGTGTTG 
TGATTTCCGC 
ACTACTTTTA 
ATTGGGCTAT 
TCGGCCGCCA 
GGCGGTTATT 
AGGCATCACC 
GCCCTGAAGC 
TTGATTGGTT 
TTATCAATCA 
AAATCACGCC 
GTTGCCTTAT 
TTTGCGTCAT 
CGATTGTTGT 
GGCCGGATAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
CGCCTTATTC 
GCACACCCAC 



This encodes a protein having amino acid sequence <SEQ ID 140>: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDARRSG RFLRTVEWLG 
VPDPRPVGAK GRADDGLIHV 



NMLPHPVTLF IIFIVLLLIA 
VSLLDADGLI KILTHTVKNF 



VSLLGVGI A £ K5GLISALMR 
WLIPLSAVI FHSL GRH PLA 
QQAAQI IHPD YVVGPEANWF 
DLSQEEKDIR HSNEITPLEY 
PETGLVAGSP FLKS IWFIF 
MST LGLYLVI IFFAAQFVAF 
GFILICAFIN LM1 GSASAQW 
VTN IITPMMS YFGLIMATVI 
CIWVFVLGLP VGPGTPTFYP 



LLLTKSPRKL TTFMWFTGI 
GLAAAFAGVS GGYSANLFLG 
FMAASTFVIA L1GYFV TEKI 
KGLIW AGVVF VALSALLAWS 
LLFALPGIVY G RITRSLRGE 
FNWTNIGQYI AVKGAVFLKK 
AVTAPIFVPM LMLAGNAPQV 
KYKKDAGVGT LISMMLPYSA 
VP* 



SAVGAY FGLS 
TGFAPLGTVL 
LSNTASE LGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
REWNAMAES 
FRLGGSVLFI 
IQAAYRIGDS 
FFLIAWIALF 



ORF12ng shows 97.1% identity in 522 aa overlap with ORF12-1 : 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 12-1 - pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
I 1 I I I :: I : I I I I I I I I I I I I I I I M I I I ! I 11 I I 1 I I I I I I I I I I I I I M I I II M I I I 
orf 12ng MSQTDARRSGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12-1 pep GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVG1AEKSGLISALMR 
M 1 I II M : : I M I : I I I : I I I M I M I I I M I I I I I II I M I I I I I I I I I I I I I I I I I I 
orfl2ng GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120. 

130 140 150 160 170 180 

orf 12-1 . pep LLLTKSPRKLTTFMVVFTGILSNTASELGYVVLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
I I | I I | I I I I I I I I I 11 I I I M I M II I I ! I 1 I I I I M : I I t 1 I I II I M I I I I I I I I I I 
or f 1 2 ng LLLTKS PRKLTTFMWFTGI LSNTASELGYWLI PLSAVI FHSLGRHPLAGLAAAFAGVS 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



190 200 210 220 230 240 

GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPE7VNWFFMVASTFVIALIGYFVTEKI 

I | | | | | | | | | | | | | | I I I I I I I I I M I I I I I I II 1 I I M I I I : I M I I I I I I I I I I I I I I 
GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 
I I | | I t I I I I I I I M I I I I I I I M I I I I i I I I I M M I I I i I I I I I I I M I I 1 I I I M I I 
VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 
250 260 270 280 290 300 

310 320 330 340 350 360 

PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

| | | I I I : I 1 M I I M I I I I I I I I I I I M I I I I : I I I I M I : I I I M I I I I I I I I I I I I I I 
PETGLVAGSPFLKSIWFIFLLFALPGXVYGRITRSLRGEREWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

I ^FAAQFVAFFNWTN IGQYI AVKGAT FLKEVGLGGS VLFI G FI LI CAFINLMIGS ASAQW ~ 

| | | | M I I I I I I I I I I I I I I I II I I : M I I I I I I I M I I I I M I I I I I I I I I I 

I FFAAQFV AFFN WTN I GQ Y I AVKGAVFLKEVGLGGSVLFIGFI LI CAFINLMIGS ASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

AVTA P IFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

I I 1 1 M I I II I I M I I i I I I I I t I I I M I I I I I I I I I I I I II M I I I I I I I M II I I I M 
AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
430 440 450 460 470 480. 

490 500 510 520 

LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPT FYPAPX 

M | t | | M I I I ! I I I I I 1 I I I I I I I I I I I I M N : I I I I I : I I 
LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 
490 500 510 520 

In addition, ORF12ng shows significant homology with a hypotehtical protein from E.coli: 

sp|P4cI3 3|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi 1 1757597 (AE000231) hypothetical protein in ogt 5 ' region [Escherichia coli] 
Length = 510 
Score = 329 bits (835) , Expect = 2e-89 

Iden-i^ies = 178/507 (35%), Positives = 281/507 (55%), Gaps - 15/507 (2%) 



orfl2-l.pep 
orf 12ng 

orf 12-1 .pep 
orf 12ng 

orfl2-l.pep 
orf 12ng 

orf 12-1 .pep 
orf 12nc 

orf 12-1 .pep 
orf 12ng 

orf 12-1 . pep 
orf 12nc 



Query: 8 
Sbjct : 
Query: 6 
Sbjct 



RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P D 
13 QSGKLYGWVEP.IGNKVPHPFLL-FIYLI IVLMVTTAILSAFGVSAKNP TDGTP 64 



IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 
+ V + LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

65 WVKN LL S VEGLH W FL PN V I KN FSG FAP LG A I LALVLGAG LAER VGLL PALMVKMASH VN 124 



Que-y 126 RKLTTFMVVFTGILSNTASELGYWLIPLSAVI FHSLGRHPLAGLAAAFAGVSGGYSANL 187 

+ ++MV+F S + +S^ V++ P-r A+IF ++GRHP+AGL AA AGV G++ANL 

Sbjct: 125 AR Y AS YMV L FI AFFS H I S S DAALV IMP PMG AL I FIAVGRH PVAGLLAA I AGVGCG FT ANL 184 

Query: 188 FLGTIDPLLAGITQQAAQI IHPDYWGPEANWFFMAASTFVIALIGYFVTEKIVEPQLGP 247 

+ t D LL+GI+ +AA +P V NW+FMA+S V+ ++G +T+KI+EP+LG 
Sbjct: 185 LIVTTDVLLSGISTEAAAAFNPQMHVSVIDNWYFMASSVWLTIVGGLITDKIIEPRLGQ 244 

Query 248 YQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRHPETGLVA 307 

+q ^ ++ + + S GL AGW + A +A ++P +GILR P V 

Sbjct: 245 WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 298 

Query: 308 GSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLXXXXXXXXX 367 

SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 
Sbjct: 299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 

Query: 368 XXXXNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQWAVTAPIF 427 
NW+N+G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 
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Sbjct : 


359 


VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 


418 


Query: 


428 


VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 


487 




VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 




Sbjct : 


419 


VPMFMLLG FHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 


478 


Query: 


488 


YSAFFLIAWIALFCIWVFVLGLPVGPG 514 






Y FL+ W+ + W +++GLP+GPG 




Sbjct: 


479 


YPLI FLWWLLMLLAW- YLVGLPIGPG 504 





10 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, arid their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



15 Example 17 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 

1 . .ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

2 01 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

4 01 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

4 51 agAT yGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT." 

This corresponds to the amino acid sequence <SEQ ID 142; ORP14>: 

1 . .TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

30 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 . RXLTNPTVSV RIMLHSG.. 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
35 ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A oiN. 
meningitidis: 



20 
25 



10 20 30 

orf 14 .pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 

1:1111 I I I I I I I : I : : I I I I : I lilt 
40 orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 

150 160 . 170 180 190 200 

40 50 60 70 80 90 

orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
45 I I I I I I ! I M I I I i I I I It I I I I I II I I I I I I I I M ! I I I I I I I I i I I I I I 1 M 1 I 1 I I 

orf 14 a GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
210 220 230 240 250 260 



100 110 120 130 140 150 

50 orf 14 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

I I I I I I I I I I I M I I I I I I i I I I I It I I I I I I 1 I I I M I I I M I I I I I I I I | | ! I I II .1 I 
orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310. 320 
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160 

orf 1 4 . pep RXLTNPTVSVRIMLHSG 
i I It I I I I I I I M II I 

o^fl4a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
5 " 330 340 350 360 370 380 

The complete length ORF 14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

10 151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG AT C AG C C AG A CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

15 401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

4 51 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

20 651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA G C AAC GG AAA TGCGGACGGC GGCAATTTTT 

7 51 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 
801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

8 51 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 
25 901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

30 This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAV I E V DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAG FAVFVF VTDGQMQVFG 

35 201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

It should be noted that this sequence includes a stop codon at position 118. 

40 Homology with a predicted ORF from N. gonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 

orf 14 . pep TAGAAGXXVFVFVTD3QVEVFGNIQTAVET 30 

I! ill II : I I : I : I : : I I I I : I MM 
45 orf 14ng GRQFG FFRVGGAS FVI T AQAG I DDALC DCLTADAAG FAV FAFVADGQMQV FGNVQPAVET 208 



50 



orf 14 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

M I I I I I I I I I II I I I I I I I M 11 M II I I I I I I I I I I I I I I I I II M I II I I I I I M I 

orfl4ng GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 2 68 

orf 14 . pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

I I I I I II I I M I M I M I I I I I I II I I I I I I 1 I I M I I I M I : I II : I I I M I I I I I I 

orf!4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 



55 orf 14. pep RXLTNPTVSVRIMLHSG 167 

I I I I I I I I I I I II I : I 

orf 14ng RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

5 201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

2 51 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
1 0 proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 

1 . . GGC CAT TACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 

15 51 GCCGT ATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 

101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 

151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 

201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 

2 51 AAA . NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 

20 3 01 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 

351 CGCCG AN AAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 

4 01 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 

4 51 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 

501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA.CC GCGC . . 

25 This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 

■ 1 . .GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GVVPQTVWA FYVGAALLVI TSAFTI FKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A.. 

30 Further work revealed the complete nucleotide sequence <SEQ ID 149>: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

35 2 01 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

2 51 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT. 

4 01 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

40 4 51 G AC AT G G T C A ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 . GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

45 7 01 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

'7 51 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA . ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

50 951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

55 1201 ATCTGTATGC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA' 
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This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 
51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 
101 AVIVMIL MPN SGSFGFGYAS LAAL5 FGALM IALLDV SSNM AMQPFKMMVG 
5 151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLA N TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA ANQEKANWIE 
2 51 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 
301 EAGNWYG VLA AVQSVAAVIC SFVL AKVPNK YHKAGY FGCL ALGALGFFSV 
351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 
|0 4 01 ICMPO IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAF5 VFLI KETHGG 

451 V* 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N .menin gitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF 16a) from strain A of N. 

15 meningitidis: 

10 20 30 

erf 16 pep GHYSDRTWKPRLXGRR LPYLLYGTLIAVIV 

I I I II I I I I ! I I I I 1 1 I I I I It It I I i M 
or f 16a iFQTLGADPHSLGW FFILPPLAGMLVQPIVG HYSDRTWKPRLGGRR LPYLLYGTLIAVIV 
20 50 60 70 80 90 100 

40 50 60 70 80 90 

orfl6 pep e»'I t MPNSGSFG FGY ASLAALSFGALMIALLDV SSNMAMQPFKMM\ ; GDMVNEEQKXYAYGI 
^' TTTi I ! i ! M S I I I I I I I I M I I I I I I I I I I I I I M I I I I M I I II I I I 1 I I ! I I M I ! 

?5 orf 16a yiLMPNSGSFGFGYA SLAALSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKGYAYGI 

110 120 130 ~ 140 150 160 

100 110 120 130 140 150 

0^fl6 pep Q5^LANTG AWAAILPFVFAYIGLA NTAXKGWPQT VWAFYVGAALLVITSA FTIFK\ y K 

30 I I 1 I I I I I I I I I I I 1 II M I I I I M I M I I II I I I i I ! I I i I I I I ! I I I I I I I I 

orf 1 6a OS FLANTG A WAAI LP FVFAY I G LA NTAEKGW PQT VVVAFYVGAALLV I TS A FT I FKVK 

170 180 190 200 210 220 

160 170 180 

35 orf 1 6 . pen EYXPZTYARYHGIDVAANQEKANWIALLKXA 

II ! I I M I I II I I I I 1 I I I I I I I I Ml:l 
orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 

40 orf 1 6a AENVWHTTDASSVGYQEAGNWYG VLAAVQSVAAVICSFVL AKVPNKYHKAGYFGCLALGA 

290 300 310 320 330 • 340 

The complete length ORF16a nucleotide sequence <SEQ ID 151> is: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

45 101 CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCTCGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ' ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

50 351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

4 01 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

55 601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

7 01 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

7 51 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

• 801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

60 851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGGTTATTT 
TTCTTCATCG 
CATCGCTTGG 
CCTTGTCGGG 
ATCTGTATGC 
TATGCTGGGC 
TGCTGCTGGG 
GTTTGA 



CGGCTGTTTG 
GCAACCAATA 
GCGGGCATTA 
CAAGCATATG 
CGCAAATCGT 
GGCTTGCAGG 
CGCGTTTTCC 



GCTTTGGGCG 
CGCGCTGGTG 
TCACTTATCC 
GGCACTTACT 
CGCTTCGCTG 
CCACTATGTT 
GTGTTCCTGA 



CGCTCGGCTT 
TTGTCTTATA 
GCTGACGATT 
TGGGCCTGTT 
TTGAGTTTCG 
CTTGGTAGGG 
TTAAAGAAAC 



TTTCTCCGTT 
CCTTAATCGG 
GTGACCAACG 
TAACGGCTCT 
TGCTTTTCCC 
GGCGTCGTCC 
ACACGGCGGG 



This encodes a protein having amino acid sequence <SEQ ID 152>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

AVIVMIL MPN SGS FGFGY AS LAALSFGALM IALLDV SSNM AMQPFKMMVG 

DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLA N TAEKGWPQT 

WVAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 

LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

EAGNWYG VLA AVQSVAAVIC SFVL AKVPNK YHKAGY FGCL ALGALGFFSV 



FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM 
ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS 
V* 



GTYLGLFNGS 
VFLIKETHGG 



20 ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



orf 16a. pep 
orfl6-l 

orf 16a . pep 
orfl6-l 

orf 16a . pep 
orf!6-l 

orf 16a . pep 
orf 16-1 

orf 16a . pep 
orfl6-l 

orf 16a . pep 
orfl6-l 

orf 16a. pep 
orf 16-1 



10 20 30 40 . 50 60 

MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 

I | | M I I M I I I I I I I I I I I I M II I I II I I I I M I ! M I 11 I I I II I I I I I ! I : M I I I 
MSEYTPQTAKQGLPALAKSTIWMLSFG FLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFF 
10 20 30 40 50 60 

70 80 90 100 110 120 

ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

I M I I I I I I I I 1 I I I I I I M I I I I M I I I M I M I M M I I I I I I I II I I I I I I I I I I I I 
ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFG FGYAS 
70 80 90 100 110 120 

130 140 150 160 170 180 

U^LSFGAl^IALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAVVAAILP 

M | | | | M I I I II I M I I I I I II I I I M I! I I II I I I I I I I I I I II M I I I I I I II I I I I 
IJ^LSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILP 
130 140 150 160 170 180 

190 200 210 220 230 240 

FVFAYIGLANTAEKGWPQTVVVAFYVGAALLV ITS AFT I FKVKEYNPET YARYHGIDVA 
| I I I I I I I I I I M I I I I I I I 1 I M M I I I I I i I I I I I II I II I I I I : I I I M I I I I I I I I 
FV FAY I GLANT AEKGVVPQTVWAFYVGAALLV ITS AFT I FKVKE YD PET YARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

I | | II I I I II I I I I I I I I I I I I II I I I I I I I II I 1 I I I I I I I I I M I II I I II I I I I I I I 
ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 
250 260 270 280 290 300 

310 320 330 340 350 360 

EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

| [ | | | I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I M I I I II II I I I I I I M I I I I 
EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
310 320 330 340 350 360 

370 380 390 400 410 420 

LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
I | | | | M I I I II I I I I I I I I I I I I t I I I M I I I I I I I I I I I I I I I I II I I I I M I I I I I I 
LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 



430 440 450 

orf 1 6a . pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

1 1 1 i i i M M l 1 1 l i 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 

orf 16-1 GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

430 440 450 
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Homology with a predicted ORF from N. g onorrhoeae 

ORF16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from A'. 



10 



15 



gonorrhoeae: 

or f 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 
orf 16. pep 
orf 16ng 



GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 30 

I : I I I I I I I I 1 I I I I 1 I I I I I I I I I i i I I 
HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 



MILMPNSGSFGFGYASIJ^SFGALMIALLDVSSNMAMQPFKI^ 

I 1 I I I I I M I It i I I I I 1 i I I I t I I I I I I I I It I I I I I I 1 M i M I I II I I i M Mill 
MIIJ^PNSGSFGFGYASIAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 



90 



191 



150 



QS FLANTGAWAAI LPFV FAY I GLANTAXKGWPQTVVVAFYVGAALLVI TSAFTI FKVK 
I I I I t I I I II I I I M I I I II I I I I I I I I I I I I M I I I I I M I M I I : I M I I t I I II 
Q S FLANT DAWAA ILPFV FAY I GLANTAEKGW PQT WV AFYVGAALL 1 1 T S AFT I SKVK 251 



E YX PET YAR YHG I DVAAN QEKANW I ALLKXA 
I I I II I M I I I M I I M I 1 I I I i : I i I : 1 

EYDPETYARYHGIDVAANQEKANWFELLKTAPCTFWTVTPVQFFCWFAFRYMWTYSAGAI 



181 



311 



20 The complete length ORF16ng nucleotide sequence <SEQ ED 1 53> is: 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGATAGGGG 
TACTTTTCAA 
CAAACAGCAA 
GTTGAGCTTC 
CGC AGATGAG 
GGCTGGTTTT 
AGTGGCTACT 
CCTGCCGTAT 
TGATGCCGAA 
TTGTCGTTCG 
GGCGATGCAG 
AGAAAAGCTA 
GTTGTGGCAG 
CACTGCCGAG 
TGGGTGCGGC 
AAAGAATACG 
CGCGAATCAG 
- AAGTGTTTTG 
CGGTATATGT 
CACTACCGAT 
GCGTTTTGGC 



ATCGCCGCGC 
ATCAAAAAAA 
AACAAGGTTT 
GGCTATCTCG 
CCGCATTTTT 
TCATCCTGCC 
ACTCAGACCG 
CTGCTTTACG 
CTCGGGCAGC 
GCGCGCTGAT 
CCGTTTAAGA 
CGCCTACGGG 
CGATTCTGCC 
AAAGGCGTTG 
GTTACTGATT 
ACCCGGAAAC 
GAAAAAGCCA 
GACGGTTACT 
GGACTTACTC 
GCGTCTTCCG 
GGCGGTGTAG 



CGGCAACCAT 
AGGATTTACT 
GCCCGCGCCG 
GCGTTCAGAC 
CAAACGCTAG 
GCCGCTGGCG 
CACTTGGAAG 
GCACGCTGAT 
TTCGGTTTCG 
GATTGCGCTG 
TGATGGTCGG 
ATTCAAAGTT 
GTTTGTGTTC 
TGCCACAAAC 
ATTACCAGTG 
CTACGCCCGT 
ACTGGTTCGA 
CCGGTACAGT 
GGCAGGCGCG 
TAGGCCATCA 



TTCGGATTTT 
TTATGTCGGA 
GCAAAAAGCA 
GGCCTTTACC 
GCGCAGACCC 
GGGATGCTGG 
CCGCGCTTGG 
TGCGGTCATC 
GCTATGCGTC 
TTGGACGTGT 
CGATATGGTC 
TCTTAGCGAA 
GCGTATATCG 
CGTGGTCGTA 
CGTTCACAAT 
TACCACGGCA 
ACTCTTAAAA 
TTTTCTGCTG 
ATTGCAGAAA 
GGAGGCGGGC 



C C AAAG C AAA 
ATATACGCCT 
CGATTTGGAT 
CTGCAAAGCT 
GCACAATTTG 
TTCAGCCGAT 
GCGGCCGCCG 
GTGATGATTT 
GCUGGCGGCC 
CGTCGAATAT 
AACGAGGAGC 
TACGGACGCG 
GTTTGGCGAA 
GCATTCTATG 
CTCCAAAGTC 
TCGATGTCGC 
ACCGCGCCTA 
GTTCGCCTTC 
ACGTCTGGCA 
AACCGGTACG 



This encodes a protein having amino acid sequence <SEQ ID 154>: 



45 



i 

51 
101 
151 
201 
251 



MIGDRRAGNH FGFSKANTFQ 
VELRLSRRSD GLYPAKLADE 
SGYYSDRTWK PRLGGRR LPY 



IKKKDLLYVG IYASNSKTRF 
PHFSNARRRP AQFGLVFHPA 
LLYGTLIAVI VMILMPNSGS 



LSFGALMIAL LDV SSNMAMQ 
WAAILPFVF AYIGLA NTAE 
KEYDPETYAR YHGIDVAANQ 
301 RYMWTYSAGA IAENVWHTTD 



PFKMMVGDMV NEEQKSYAYG 
KG W PQT WV AFYVGAALL I 
EKANWFELLK TAPKVFWTVT 
ASSVGHQEAG NRYGVLAAV* 



ARAGKKHDLD 
AAGGDAGSAD 
FGFGYA SLAA 
IQSFLANTDA 
ITSAFTISKV 



PVQFFCWFAF 



50 ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 



55 



60 



30 40 50 60 70 80 

orf 16-1 pep MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

I : : I 1 I I ! : I : II M I 

orfl6ng DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 
50 60 70 80 90 100 

90 100 110 120 130 140 

orf 16-1 pep WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
I I M | | | I I II I I II I I I I I I I I I II I I I M I I I I I i I I M I I I I M I I I II I I I M I II 
orf 16ng WKPRLGGRRLPYLLYGTLIAVIVKILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
110 120 130 140 150 160 
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150 160 170 180 190 200 

orf 16-1 . pep MQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAVVAAILPFVFAYIGLANTAEKGVVPQTV 
| | J M I M I M I I I I I I : II M I M II 11 I I I I I I I I I I I I I I I I I I II I I I I I I I I I ! 
5 orf 16ng MQPFKMMVGDMVNEEQKSYAYGIQSFIANTDAVVAAILPFVFAYIGLANTAEKGVVPQTV 

170 180 190 200 210 220 

210 220 230 240 250 260 

orf 16-1 .pep WAFYVGAALLVITSAFTI FKVKEYDPETYARYHGI DVAANQEKANWIELLKTAPKAFWT 
10 * ' II II I I I I I II : M I I I I I I I I I M I I I I II I I I I I I II M I I I I I : I I I I I I I I : I I I 

orf 16ng WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 
230 240 250 260 270 280 

270 280 290 300 310 320 

1 5 orf 1 6-1 . peD VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDAS SVG YQEAGNWYGVLAAVQSVAAVICS 

II II M I I I I I : M I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I M It 
orf 16ng VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
290 300 310 320 330 340 

20 Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 155- 



25 



30 



51 
101 
151 
201 
251 

30a 

351 



ATGTTGTTCC 
GAACGGCTGT 
CAATCACCCG 
GTTGCCGAAG 
CGGAAAATAC 
GNATTTTGAN 
CCGAGCTATG 
CAGCCAGAAT 



GTAAAACGAC 
ACGTTGATGT 
NAAACACGTT 
ACAATGCCCA 
TGGTTCGTCG 
GGCAGGGCTG 
C . TGCCACCA 



CGCCGCCGTT 
TGTGGGGAAT 
GNCAAAGACC 
ATTGGAAAAG 
TCAATCCCGA 
GACAAACCCT 
AGCCCTGCCG 



TTGGCGCATA 
GAACAACCCG 
AAATCCGNGN 
GGCAGCCTGG 
AGATT CGGCG 
TCCAAATAGT 
GTCAAACTCG 



CCTTGATGCT 
GTCAGCGAAA 
CTTCGGTGTG 
TGATGATGGG 
AA.NTGACGG 
TNAGGATACC 
GATCGNCTGG 



35 



This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK G S LVMMGGK Y WFWNPEDSA XXTG I LXAGL DKPFQIVXDT 
101 PSYXCHQALP VKLGSXGSQN . . . 

Further work revealed the complete nucleotide sequence <SEQ ID 157>: 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGTTGTTCC 
GAACGGCTGT 
CAATCACCCG 
GTTGCCGAAG 
CGGAAAATAC 
GCATTTTGAA 
CCGAGCTATG 
CAGCCAGAAT 
AGCCTGCCGA 
CTCGACAATC 
CTACGCCACA 
TGCCTGCCGA 
AAGCTGTTTG 
GGCGGGCGCG 
ATGCCGCCCG 



GTAAAACGAC 
ACGTTGATGT 
CAAACACGTT 
ACAATGCCCA 
TGGTTCGTCG 
GGCAGGGCTG 
CTCGCCACCA 
TTCAGTACCG 
CATCGCCAAG 
GGACCATTTA 
CCGCAAAAAC 
TATTTATTAC 
CAAATATCTT 
GTACTGGCCT 
CAAATGA 



CGCCGCCGTT 
TGTGGGGAAT 
GACAAAGACC 
ATTGGAAAAG 
TCAATCCCGA 
GACAAACCCT 
AGCCCTGCCG 
AAGGCCTTTG 
CTGAAACAGC 
CACGCGCTGC 
TGAACGCCGA 
ACGGTTACTG 
ATATACGCCC 
TGCCTGCGGC 



TTGGCGGCAA 
GAACAACCCG 
AAATCCGCGC 
GGCAGCCTGG 
AGATTCGGCG 
TCCAAATAGT 
GTCAAACTCG 
CCTGCGCTAC 
TCGGGTTTGA 
GTATCCGCCA 
TTACCATTTT 
AAGAACATAC 
CCCTTTTTGA 
GGCTCTGGGT 



CCTTGATGCT 
GTCAGCGAAA 
CTTCGGTGTG 
TGATGATGGG 
AAGCTGACGG 
TGAGGATACC 
AATCGCCTGG 
GATACCGACA 
AGCGGTCAAA 
AAGGCAAATA 
GAGCAAAGTG 
CGACAAATCC 
TACTGGATGC 
GCGGTCGTGG 



This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 



55 



1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 , VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 
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WO 99/24578 PCT/IB98/01665 

-141- 

201 KLFANILYTP P FLILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningi tidis f strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of AT. 
5 meningitidis: 

10 20 30 40 50 60 

orf28 pep MLFRKTTAAVLAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 

fTTTTTTTTTTl iTTTTTi i : i : m 1 1 : i in :iin nm immmiimi! 

0 r f 2 8 a MLFRKTTAAVLAATLMLNG CTVMMWGMN S P FSETTARKH VDKDQ I RAFGWAE DNAQLE K 

JQ * 10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf2 8 pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 
I | M i | | I | | | | I I I I I I I I I I I i I I I I I II : I : I : : I I I I I M I : I I I 
15 or f 2 8a GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 

70 80 90 100 110 

o-f8a FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 

20 The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

25 201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

2 51 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 
301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

3 51 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

4 01 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 
30 4 51 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

35 701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ED 160>: 

1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

40 151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT TL ILDAVGAV LALPVAALI A ATNSSDK* 

ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orf28a pep MLFRKTTAAVIAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGVVAEDNAQLEK 
45 I 1 I I I I I I I I I I I M II I I I I : I :! I II : I Ml : I I I I I M I M I I I I I I I I I I I I II 

orf 28-1 • MLFRKTTAAVLAAT LMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 119 

50 orf28a pep GSLVMMGGKYWFVVNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 

| | | | | | M | I II I M I I II I II II II I I I I I! I 1 : I i : I : I : I I I I I I I I I I : I I I 
0-f28-3 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

55 120 130 140 150 160 170 179 

o-f28a pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
I M | M | I I I I I I : I I 11 11 I I I I I I I I : I I I I I I I I I I I I I M I I I M I I M I I M I I 
orf 2 8-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 
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180 190 200 210 220 230 

o-f28a oeD E QSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 
* H I | 1 | | | | I I I I I I :: I I I I I I I I II HI II I M : I I ! I I I I : I I I I :: : :: M 

5 or-f2 8-T EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

190 200 210 220 230 

Homology with a predicted ORF from N g onorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
10 gonorrhoeae: 

orf28 pep MLFRKTTAAVIJUiTLMLNGCTIJyiLWGMNNPVSETITRKHVXKDQIRXFGVVAEDNAQLEK 60 

| | | M | | | | | U M:lllll:M I I I I I I I : I I ! I I I I Mill H I I M I I I I I I I 
orf2 8ng M LFRKTTAAVLAATLILNGCTMMLRGMNNPYSQTITRKHVDKDQIRAFGWAEDNAQLEK 60 

15 or^28 pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

I I I I I I I I I I I I : I I I I I I I I I : I 1 i I I I I I I I I 1 I ! I I I I I I I I I :: I i I I 
orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 120 

The complete length ORF28ng nucleotide sequence <SEQ ID 161 > is 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

20 51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

2 01 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

2 51 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

25 301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

4 01 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

4 51 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

30 551 TGCCCGCCGA TATTTATTAT ACGGTTACTG AAAAACATAC CGACAAATCC 

601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

7 01 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 

35 l MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALI A AANSSDK* 

40 ORF28ng and ORF28-1 share 90.0% identity in 231 aa overlap: 

10 20 30 40 50 60 

or f 2 8-1 pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGWAEDNAQLEK 
I I i I M I I I M I I I I : I II II : I I I I I 1 I H : I M I I I I I I I I I I I I I I I M I I I I I I I 
orf28ng ML FRKTT AAV LAAT L I LNGCTMMLRGMNN PVSQTI TRKHV DKDQ I RAFGWAE DNAQLEK 

45 ~ 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 28-1 . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 
I I I I I i I I I I I I : II I I I M II 1 I : M ! II I I I I I I I I I M I I I M I M M I : I : I I I I I 
50 orf28ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28- 1 . pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRT I YTRCVSAKGKYY AT PQKLNADYHF 

55 * * III I I M I I I I : I I I I I I I I I I : I I I 1 M I I II I I I I I I I I I I I I I I I I I II i I I I 

orf28ng FSTGGLCLRYDTGRPDDIAKLKQLEFKAVKLDNRT I YTRCVSAKGKYYAT PQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

60 orf 28- 1 . pep EQSVPADI YYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAVVDAARKX 

I I II I I I I I I M I I : I I I I I II I : 1 I I I 1 I I : I I I I M : M I : I I I : : I : 
orf28ng EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from ^meningitidis and K gonorrhoeae, and 
5 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6A shows the results of affinity purification of the GST-fusion protein, and Figure 6B shows the 
results of expression of the His-fiision in E.coli. Purified GST-fusion protein was used to immunise 
10 mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 
that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in N .meningitidis <SEQ ID 163>: 

1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

15 5i TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG . ATTAT CCGCCCCCCG 

2 51 GAGGAGCAAG GG AT AT AT AC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

20 301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT . . 

This corresponds to the amino acid sequence <SEQ ED 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 

25 101 TKTSIVPQAP FSDRWLEENA GAASG . . 

Further work revealed the complete nucleotide sequence <SEQ ID 165>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

30 151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

2 51 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

35 4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

4 51 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTAT GTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAG C AG G AAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

40 651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

7 51 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 

851 CGAGCCTATT AC AG G AC AG T GCTTTTGCGG TAAAAGACGG TATCAACTCT 

45 901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 

951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

50 1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 
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1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

13 51 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

5 14 01 AG C AAAAG AT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

1451 GA 

This corresponds to the amino acid sequence <SEQ ID 166; ORF29-l>: 

1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

10 101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

15 351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

4 01 VHKTLT PNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

4 51 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
20 ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 

meningitidis: 

10 20 30 

erf 29 .oeo VSPVLPITHERTGFEGVIGYETHFSGHGHE 

I : I : I i II II I I I I I I : II II I I I I I I M I 
25 or f 29a EPGGKYKLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHE 

50 60 70 80 . 90 100 

40 50 60 70 80 90 

or f 2 9. Deo VHSPFDKHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
30 I II I I I : I I I I I I I I I I I i I I 1 M M I I I I I I I II f I I I Mill:: I I M I I 1 M I I 

orf2Sa VHSPFDNKDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 

100 110 120 

35 orf 29 . peD SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 

I I I I I I M 1 I : : I I I : I I I M I I I : I I I I I II I 
orf 2 9a XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 

40 orf 29a MDD I RG I VQGAVN P FLMG FQG VG I GAI T DS AVS PVTDT AAQQT LQGXNHLGXLS PEAQLA 

230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATT CAT GAT G CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

45 101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CG AT AAT CAT GATTCAAAAA GCACTTCTGA 

50 351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

4 51 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

55 601 CGTGCTGATG AAGCAGGAAA ACTG AT AT GG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

7 51 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

60 851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 
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951 TGCCCTTGCC GTAGCAGANG CCGCAACTAC GGTTTGGGGC . GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG T T AAAAAT AC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

HOT GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

5 1151 CACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

1201 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAG CAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

1301 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 

1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

JO 14 01 TACAGCATTT AGACCAACAT C AG G T AAAAA ATATTATGAT GATTTATAG 

This encodes a protein having amino acid sequence <SEQ ID 168>: 

1 MNXPIQKFMM LFAAAISXLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGII GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

15 151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 

351 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 

20 4 01 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 

4 51 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 

10 20 30 40 50 60 

orf2 9a peD MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
25 * II It I I M M II M II I I I I I I I I M M I I I I I I I 1 I I I I I I I I I I I M I I I I i I I t : 

orf2 9-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKKYEPGGKYHLFGNARGSVKK 

10 20 30 4C 50 60 



30 



35 



40 



45 



50 



55 



60 



70 80 90 100 110 120 

orf29a pep RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
! | i I | | M II I I t : I : I ! I 1 I I 11 I I I I : M M I I I t M M 1 I I I i I I : t I I I I I I I I I I 
orf29-l RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf2 9a pep GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 
M i I I I I I M I I I II M I i M M I I I I I I I I I I ! I I I I II I i 1111111111:1111: 
or f 2 9-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

o-f29a pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 
I I I I I I I I I I I I II I I II II II I I I I II I I I I I I I I M I I I M I I : I I I I I I M I I I II I 
orf29-l APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 29a . pep FQGVG I GAIT DSAVSPVTDTAAQQTLQGXNHLGXLSPEAQ LAAATALQDS AFAVKDGINS 

I I 11 I I I I I II M I I I I I I I I I I M I I I I II II I I I I I I I I : 1 I I I I 11 I I I I I I! 
orf 29-1 FQGVGIGAIT DSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf29a.pep ARQW ADAH PN I T AT AQT ALA VAXAAT T VWGGKKVE LNPTKW DWVKNTG YXT P AVRTMHT L 

I : 1 I I I II I I I I I I I I II I :: I I 1 Ml I I I I I I I I M I M I I I I I I M : I I : I I 
orf 2 9- 1 AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTKWDWVK>JTGYKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 29a. pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

11111111:111: 111: 1 
orf 2 9-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 



BNSDOC1D: <WO 9924578A2_I_> 



WO 99/24578 



PCT7IB98/01665 



-146- 



Homologv with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 



10 



15 



gonorrhoeae: 

orf29.pep 
orf29ng 
orf29.pep 
orf29ng 
orf 29 .pep 
orf 29ng 



VSPVLPITHERTGFEGVIGYETHFSGHGHE 30 
I : I : I I I I I I I I I I I I I I I I I I II I I I I I i 

EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 90 
I II I M : I I I I 11 I II I I I II I I I t M I I 1 I I I I II I M Mill:: I II I II II II I 

VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 

SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 125 
I I :: I I I I I I I I : I I M I I M I M : I I II I I I I 

SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 



The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 170>: 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNLPIQKFMM LFAAAISLLQ IPISHANGLD 



FGNARGSVKN 
HEVHSPFDNH 
GYPPPGGARD 
RADEAGKLIW 
DSAVSPVTYA 
ARQWADAHPN 
KPAARHMQTV 
YHGFPQSVDA 
DGKINHRLFV 



RVCAVQT FDA 
DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AARKTLQGIH 
ITATAQTALA 
DGEMAGGNKP 
FSENGTVIQI 
PNQQLPEK* 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
NLGNLSPEAQ 
VTEAATTVWG 
LESKNTVTTN 
VGGDNIVRHK 



ARLRDDMQAK 
HERTGFEGVI 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAATALQDS 
GKKVELNPAK 
NFFENTGYTE 
LYIPGSYKGK 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGG 
NAGAASGFLS 
FQGLGVGAIT 
AFAVKDSINS 
WDWVKNTGYK 
KVLRQASNGD 
DGNFEYIREA 



4 51 DGKINHRLFV PNQQLPEK* 

In a second experiment, the following DNA sequence <SEQ ID 171> was identified: 



30 



35 



40 



45 



50 



55 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



atgAATTTGC 
gatgctGCat 
GCGATGATAT 
TTTGGTAATG 
ATTTGATGCA 
CAGGATTTGA 
CACGAAGTAC 
TTTCAGCGGC 
CAGGGTCGGA 
GGTTATCCGG 
AGGAACTTCA 
CAGACCGCTG 
CGTGCGGATG 
TTGGCGGGCT 
TTAATCCTTT 
GACAGTGCGG 
AGGTATTAAT 
CGAGCCTATT 
GCCAGACAAT 
TGCCCTTGCC 
TAGAACTTAA 
AAACCTGCTG 
GAATAGACCG 
CCTATCCTAA 
GCGGCTCAAG 
TTTTCCAATA 
TTTGGGTTGG 
AGAGATGGCA 
TGCAACTACA 
ATGAAAAAAG 



CTATTCAAAA 
ATCCCCATTA 
GCAGGCAAAA 
CTCGCGGCAG 
ACTGCGGTCG 
AGGTGTTATC 
ACAGTCCGTT 
GGCGTAGACG 
AATACATCCC 
AACCACAAGG 
AC C AAAAC AA 
GCTAAAAGAA 
AAGCAGGAAA 
AACCGTATGG 
TTTAACGGGT 
TAAGCCCGGT 
GATTTAGGAA 
ACAGGACAGT 
GGGCTGATGC 
GTAGCAGAGG 
CCCGACCAAA 
CCCGCCATAT 
CCTAAATCTA 
GTTGGTTAAT 
ATCCAAGATT 
GGAACTGCAA 
TGAGGGTGCA 
CTCGACAATA 
GGTATTCAAG 
AAATAAAATT 



ATTCATGATG 
GTCATGCGAA 
CACTACGAAC 
TGTTAAAAAT 
GCCCCATACT 
GGCTATGAAA 
CG AT AAT CAT 
GCGGTTTTAC 
GCAGACGGAT 
GGCAAGGGAT 
AGATAAACAC 
AATGCCGGTG 
ACTGATATGG 
ATGATATTCG 
TTTCAAGGGG 
CACAGATACA 
ATTTAAGTCC 
GCCTTTGCGG 
CCATCCGAAT 
CCGCAGGTAC 
TGGGATTGGG 
GCAGACTGTA 
TAACGTCGGA 
CAGCTAAATG 
GAGTCTAGCT 
CTTATGAAGA 
AGACAAACTA 
TCGGCCACCA 
CAAATTTTGA 
AAAAAT GG AC 



ctgttggcAg 
CGGTTTGGAT 
CGGGTGGCAA 
CGGGTTTGCG 
GCCTATTACA 
CCCATTTTTC 
GATTCAAAAA 
CGTTTACCAA 
ATGACGGGCC 
AT AT AC AG CT 
TGTTCCGCAA 
CCGCTTCCGG 
GAAAACGACC 
CGGCATCGTC 
TAGGGATTGG 
GCCGCTCAGC 
GGAAGCACAA 
TAAAAGACGG 
ATAACAGCAA 
GGTTTGGCGC 
TTAAAAATAC 
GATGGGGAGA 
AGGAAAAGCT 
AGCAAAACTT 
ATTCATGAGG 
GGCAGATAGA 
GTGGAGGCGG 
ACAGAAAAAA 
AACTTATACT 
ATTTAAATAT 



cggcaatatc 
GCCCGTTTGC 
ATACCATCTG 
CCGTCCAAAC 
CACGAACGGA 
AGGACACGGA 
GCACTTCTGA 
CTTCATCGGA 
TCAAGGCGGC 
ACCATATCAA 
GCCCCTTTTT 
TTTTCTCAGC 
CCGATAAAAA 
CAAGGTGCGG 
GGCAATTACA 
AGACTCTACA 
CTTGCCGCCG 
CATCAATTCC 
CAGCCCAAAC 
GGTAAAAAAG 
CGGCTATAAA 
TGGCAGGGGG 
AATGCTGCAA 
AAATAACATT 
GTAAAAAAAA 
CTAGGTAAAA 
ATGGTTAAGT 
AAT C AC AAT T 
AT T G AT T C AA 
TAGGTAA 
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This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l 



60 



1 

51 



MNLPIQKFMM LLAAAISMLH IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 
FGNARGSVKN RVCAVQT FDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 
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101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

5 301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

4 01 AAQDPRLSLA IKEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

451 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 

ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 

10 10 20 30 40 50 60 

orf2 9ng-l .pep MNLPIQKFMMLLAAAISMLHIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

I | M I I I I M I : I I i I 1 : I : ! I I I M I I I I I I I M II I I II I I I I I I I I M I I M M I I : 
orf2 9-l MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 

10 20 30 40 50 60 

15 

70 80 90 100 110 120 

orf2 9ng-l .pep RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

II I I I II I M I I : I : I I 1 II I I I II I I I I I I I I M M I I I I II I I I I : I I I I I I I I I I I 
or f 2 9-1 RVYAVQT FDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

20 70 80 90 100 110 120 

130 140 150 160 170 180 

orf2 9ng-l - pep GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPE PQGARDIYSYHIKGTSTKTKINTVPQ 
I 1 | | | | I I | I I I I I II I I I I I I I I I I II : I t i I I I I I I I I : : II I I I I I I I I I I 
25 orf2 9-l GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTOIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 9ng-l . pep APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
30 I I I I I I I I I I I I I 1 1 I I I : I I I I I I I I I I I I : I I : I I I I I I 1 I I : I I I M M I I I I I I 

orf 29-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 

190 200 210 220 230 240 

250 260 270 280 290 300 

35 orf 29ng-I . oeo FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 

I | | II I I I I I I II I I M I I I I I M I I I I I I I I I : I I I I I I I I I I I I I M I I I I I I I I I I I 
orf 29-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 

40 310 320 330 340 350 360 

orf29ng-l.peo ARQW ADAH PN I T AT AQT ALAV AEAAGT VWRGKKVE LN PTKWDWVKNTG YKKPAARHMQTV 
I : I I I I 11 I i I I II I I 1 I I :: I I I i I I I I I I I I I I I I I I I I I I I I I I I 1 I I I ! I I I M I : 
orf 2 9-1 AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELN PTKWDWVKNTG YKKPAARHMQTL 

310 320 330 340 350 360 

45 

370 380 390 400 410 419 

orf29ng-l .Dep DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 

I I I I M I I : I I I : : I : : : : I : : : : : : : : : 

orf 2 9-1 DGEMAGGNKPIKSLPNSAAEKRKQN FEKFNSNWSSASFDSVHKT LTPNAPGILSPDKVKT 

50 370 380 390 400 410 420 

420 430 440 450 460 470 479 

orf 2 9ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

55 orf 2 9-1 RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 

430 440 450 460 470 480 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
60 could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 21 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 



BNSDOCID: <WO 9924578A2_L> 



WO 99/24578 PCT/IB98/01665 

-148- 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG . . . 

This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 

5 1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 1 75>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

10 151 ATGAAGGAGA CAG AGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAAT CGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

15 4 01 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATT AT CAT CG TCGAGTTACG 

4 51 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ED 176; ORF30-1>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QV FHTRADAP MQLAELSQKE 

20 51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
25 ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of N. 
meningitidis: 

10 20 30 40 

orf 30 . pep MKKQITAAVMMLSMIAPAMA KGLDNQAFEDQMFHTRADAPMO 
_ I I I I I I II I I I I I I II I I I I I I I I I I I I I I I : M I I I II M I 

30 orf 30a MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 

10 20 30 40 50 60 

orf 30a LXIIjGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAI PGXVGAAGKWS FAKYGRE I 

70 80 90 • 100 • 110 120 

35 The complete length ORF30a nucleotide sequence <SEQ ID 1 77> is: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGANA CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

40 201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

2 51 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGN AAATTTCCCC ATTATCATCG TCGAGTTACG 

45 4 51 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 

1 MKKQITAAVM MLSMIAPAMA ' NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

50 101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 

or f 30a . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 
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10 



15 



orf30-l 
orf 30a . pep 
orf 30-1 
orf 30a .pep 
orf3C-l 



| }[ | | I 1 ! | I | I I I I I I I If I I I I I I I I I t II M M ! I I I I It II I I I I M I i mil 
MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



60 



LXILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGXVGAAGKWSFAKYGREI 120 
I | | | | |M Ml I I I III IN I III I Mi ill I I Ml tit I I II M I 1 IN I II I I I I I 

LAILGGAAIGMWTQHGFSYATTGRPAS VRDVAIAGGLGAI PGGVGAAGKW S FAKYGREI 120 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 

I || | | M II Ml I I I I III M I III I I I I I I Ml I I I II I Ml I III Ml I I I I Ml III 

KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 180 



FX 
I I 
FX 



orf 30a .pep 
orf30-l 

Homology with a predicted ORF from N. gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 



gonorrhoeae: 



20 



orf 30 .pep 
orf 30ng 



MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 
M M I M II II M I I M M Ml I II I I I I II : M I II M II I 

MKKOITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 



42 



60 



The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAAAAC 
CGCAATGGCA 
ACACGCGGGC 
ATGAAGGAGA 
TGCCATTGGT 
GACCAGCTTC 
GATGTAGGTG 
GATTAAAATC 
GTCATCCTAT 
ACGGGCAAGA 
ATCAAAATCT 



AAATCACCGC 
AACGGATTGG 
AGATGCGCCG 
CTGAAGGGGC 
ATGTGGACAC 
TGTTAGAGAT 
CTGCAGGAAA 
GGCAATAATA 
TGGAAAATTT 
CTTTGCCTGG 
ACGGACAGAT 



AGCCGTAATG 
ACAATCAGGC 
ATGCAGTTGG 
TTTTCTTCCA 
AGCATGGTTT 
GTTGCTGGCG 
GGTTGTTTCC 
TGCGGATAGC 
CCCCATTATC 
ACAGGGAATT 
CATGGAAAAA 



ATGCTGTCTA 
ATTTGAAGAC 
CGGAGCTTTC 
TTGGCTATCT 
TAGTTATGCA 
GATTAGGCGC 
TTTGCTAAAT 
CCCTTTCGGT 
ATCGTCGAGT 
GGTCGTCATC 
CCGCTTCTAA 



TGATCGCCCC 
CAAGTGTTCC 
TCAGAAGGAG 
TGGGTGGTGC 
ACGACAGGCA 
AATTCCTGGT 
ATGGACGTGA 
AATAGAACAG 
TACGGATAAT 
GCCCTTGGGA 



35 



40 



45 



50 



55 



This encodes a protein having amino acid sequence <SEQ ID 180>: 

1 MKKQITAAVM MLSMIAPA MA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 

10 . 20 30 40 50 60 

orf 30ng . oep mKKQITAAVMMLSMIAPAMAInIGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

I ! I I || I I II M II I I M ! I II I ! I I I I II II II M II I II I I M II i I M 1 I II I III I 
orf 30-1 mKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

10 20 30 40 50 60 

70 80 90 100 110 

orf30ng.pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA — GGLGAIPG DVGAAGKWS FAKYGREI 
! I I I II I I M M I I M I M II I III I II M II I M I I 1 I I I I M II I I I M I I I M I 
orf 30-1 LAI LGGAAIGMWTQHGFSYATTGRPAS VRDVAIAGGLGAI PGGVGAAGKWSFAKYGREI 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 30ng . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

II || I II M II II I M M I II Ml I II M I I II I I II I II I M II I I I I I II M II I I II 
orf 30-1 K1GNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

130 • 140 150 160 170 180 



60 



orf 30ng . pep 
orf30-l 



180 
FX 
I 1 
FX 
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Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 22 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTg . CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

10 2 01 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT.. 

This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

1 5 Further work revealed a further partial nucleotide sequence <SEQ ID 1 83>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

20 201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT . . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI . . 

Computer analysis of this amino acid sequence gave the following results: 

25 Homology with a predicted ORF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF3 1 .ng) from N. 
gonorrhoeae: 

orf3l .pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 
I I I I I I I I I I I I I I I 1 I I II I I I I I I I II II I I I II :: I I I I I II : : i 

30 orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf31 .pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II 11111111:11 I I I I { I I I 
orf31ng C FS ALG FSLCLALGTVN I AFADG 1 1 T DKAAPKTQQAT ILQTGNG I PQVN I QT PTSAGVS V 114 

35 The complete length ORF31ng nucleotide sequence <SEQ ID 185> is: 

1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

40 201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

3 51 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

4 01 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 
45 4 51 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGG CT AT A TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
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701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
7 51 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
5 51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 

101 QVNIQTPTSA GVSVKQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS * 

10 This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 

0^31nq 96 GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 4 5 GNGVPVVNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

15 Orf31nq 155 ARVWNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANWVANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

20 Orf31nc 215 -AGDFSGFKIRQGNAVIAGHGLDARDTDF 24 2 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 

10 20 30 40 50 60 

25 o^-fSl- 1 pep MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 

| ! i M I ! I I M M I M M I M I I II ! I I I M I I ! I I! : : I M I t II 1:1 

or^31nq MNKTLYRVI FN RKRGAWAV AETTKREGKS CAD SGSGSVYVKSVS FIPTH SKAFC 

10 20 30 40 50 



70 80 
FSLLGFSLCLAVGTANIAFADGI 
II I I I I If I I : I I : I ! I I I M I 

FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 
60 70 80 90 100 110 

35 On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



30 



orf 31-1 . pep 
orf 31ng 



Example 23 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 187>: 

40 l ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG . . 

45 This corresponds to the amino acid sequence <SEQ ID 1 88; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 
50 51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG' GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 
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10 



15 



20 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



GCGCTTTGCC 
TGTCCGCACT 
CCGATGTCGT 
CACATTATCC 
CGCGGAGGAA 
GTGTTCAAAA 
TTGATACGCG 
CCTGCGAGAG 
TTTTCGGCTA 
CAGGCAGGCA 
CAGCCTCAAA 
GCGATGTTTT 
CCGCAACAGG 
CCGCGGCGAA 
TTTGGCACAT 
GCCTTTTGGG 
ACACCGCCGT 
CACAACGCCT 
CGGCAAGGCG 
TCCTGAAAAA 



CTGATTTGCC 
TGGCATTCCG 
CATCGAAACT 
GCCGACACAA 
AGCAATGAAA 
ATATTTTTGG 
AACGTGATTA 
CGGCTGATGC 
TCGGAGCGAT 
GCCCGATGAC 
CAAAGCGGCG 
TCAGACGGCA 
ACTTCGACCA 
GACAGTTTCG 
CTACCCGCAA 
ATAAGGCACA 
CTTTCGGACG 
CGAATGTTGG 
CGGAGGATTG 
CTCGCTGCCT 



CGATGTTCCC 
ATGCGGCAGA 
TTTGCCTGCG 
GCCGCTTTGG 
GGCTGCATCT 
TTTATGGGTT 
CTGCGAAGCC 
TGCCCGAAAA 
GTTTGGGCAA 
ACTGTTGCTG 
TTATTCCGCA 
TCCGTCCGCC 
ACTGCTGCAC 
TGCGCGCCCA 
GACGAGAATG 
CGGTTTCTAC 
ACCTCAACGG 
CAAACCCTGC 
GAGCCGTTAT 
TTGTTTCAAA 



TGCGTTCATC 
TATTGATACC 
ACCTGCCCGA 
CTGAATTGGG 
GATGCCTTCG 
TCAGCGAAAA 
GTCCGTTTCG 
AAACGCCTCC 
AGTGGCTGGA 
GCGGGGACGC 
AGATGCCCTG 
TCGTCAAAAT 
CTTGCCGACT 
GCTTGCGGGC 
TCCATCTCGA 
ACGCCCGAAA 
CGGAGAGGCT 
AACAACATCA 
CTTTTCGGGC 
GCATCAAAAA 



AGG AT ATT C A 
GCGCCTGTTC 
AAATGTGCTG 
AATATTTGAG 
CCGCAGGAGG 
AAGCGGCGGG 
ATACTGAAGC 
GAATGGCTGC 
AATGTGGCGA 
AAATCATCGA 
CAAAACGACG 
CCCTTTCGTG 
GCGCCGTCAT 
AAACCCTTCT 
CAAACTCCAC 
CCGTGTCGGC 
TTATCCGCAA 
AAACGGCTGG 
AGCCGTCAGC 
ATACGCTAG 



This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l^ 



25 



30 



l 

51 

101 

131 

201 

251 

301 
n z. - 



MNTPPFVCWI 
ALCPDLPDVP 
HI IRRHKPLW 
LIRERDYCEA 
QAGSPM7LLL 
PQQDFDQLLH 
AFWDKAHGFY 
RQGAEDWSRY 



FCKVIDNFGD 
CVHQDIHVRT 
LNWEYLSAEE 
VRFDTEALRE 
AGTQIIDSLK 
LADCAVIRGE 
TPETVSAHRR 
LFGQPSAPEK 



IGVSWRLARV 
WHSDAADIDT 
SNERLHLMPS 
RLMLPEKNAS 
QSGV I PQDAL 
DSFVRAQLAG 
LSDDLNGGEA 
LAAFVSKHQK 



LHRELGWQVH 
APVPDWIET 
PQEGVQKYFW 
EWLLFGYRSD 
QNDGDVFQTA 
KPFFWHIYPQ 
LSATQRLECW 
IR*w 



LWTDDVSALR 
FACDLPENVL 
FMGFSEKSGG 
VWAKWLEMWR 
SVRLVKIPFV 
DENVHLDKLH 
QTLQQHQNGW 



35 



40 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from K meningitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A of N. 

meningitidis: 

10 20 30 40 50 60 

orf 32 .oeo MNTPPFVCWI FCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

ill!!! I M I I I I 1 I I I I I I II I I I I I I I I I I 1 I M I I I I I t I I I i I i I I I I I I I 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 
' orf 32 . pep C V HQ D I H VRT W H S DAAD I DT A 
I I I 1 I I It I I I I I I I I I I I I I 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 



45 The complete length ORF32a nucleotide sequence <SEQ ID 191 > is: 



50 



55 



60 



I 


ATGAATACTC 


51 


TTTCGGCGAC 


101 


AACTCGGTTG 


151 


GCGCTTTGCC 


201 


TGTCCGCACT 


251 


NCGATGTCGT 


301 


CACATCATCC 


351 


CGCGGAGGAN 


401 


GTGTTCNAAA 


451 


CTGATACGCG 


501 


CTTGCGCAAG 


551 


TTTTCGGCTA 


601 


CAGGCAGGCA 


651 


CAGCCTCAAA 


701 


GCGATGTTTT 


751 


CCGCAACAGG 



CTCCTTTTTC 
ATCGGCGTTT 
GCAGGTGCAT 
CTGATTTGCC 
TGGCATTCCG 
CATCGAAACT 
GCCGACACAA 
AGCAATGAAA 
ATANTTTTGG 
AACGCGATTA 
AGGCTGATGC 
TCGGAGCGAT 
GTCCGTTGAC 
CAAAACGGCG 
TCAGACGGCA 
ACTTCGACAA 



TGCTGGANTT 
CGTGGCGGCT 
TTGTGGACGG 
CGATGTTCNC 
ATGCGGCAGA 
TTTGCCTGCG 
GCCGCTTTGG 
GGCTGCACNT 
TTTATGGGTT 
CTGCGAAGCC 
TTCCCGAAAA 
GTTTGGGCAA 
ACTTTTGCTG 
TTATTCCGCA 
TCCGTCCGCC 
ACTGCTGCAC 



TTTTGCAAGG 
TGCCCGTGTT 
ACGATGTGTC 
TGCGTTCATC 
TATTGATACC 
ACCTGCCCGA 
CTGAANTGGG 
GATGCCTTCG 
TCAGCGAANN 
GTCCGTTTCG 
AAACGNCCCC 
AGTGGCTGGA 
GCNGGGGCGC 
AGATGCCCTG 
TCGTCAAAAT 
•CTTGCCGACT 



TCATCGACAA 
TTGCACCGCG 
CGCCTTGCGT 
AGG AT ATT C A 
GCGCCTGTTC 
AAATGTGCTG 
AATATTTGAG 
CCGCAGGAGA 
NAGCGGCGGA 
ATAGCGGAGC 
GAATGGCTGC 
AATGTGGCGA 
AN ATT AT CG A 
CAAAACGACG 
CCCTTTCGTG 
GCGCCGTCAT 
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801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GATGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGCATCGGC 

951 ACACCGCCGC CTTTCAGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

5 1001 CACAACGCCT CGAATGTTGG CAAATCCTGC AACAACATCA AAACGGCTGG 

10 51 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTTGGGC AGCCTTCCGC 

1101 AT CCGAAAAA CTCGCCGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This encodes a protein having amino acid sequence <SEQ ED 192>: 

1 MNTPPFSAGX FCKVIDN FGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

10 51 ALCPDLPDVX CVHQDIHVRT WHSDAADIDT APVXDWIET FACDLPENVL 

101 HIIRRHKPLW LXWEYLSAEX SNERLHXMPS PQESVXKXFW FMGFSEXSGG 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD VWAKW LEMWR 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDKLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

15 301 AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW QILQQHQNGW 

351 RQGAEDWSRY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 

10 20 30 40 50 60 

o-f32-l pep MKT PPFVCWI FCKVIDN FGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

20 " MINI M I I M II I I I I I I M I II I I I M M I I I I I I I II 1 I I M I I II 

o^32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 90 100 110 120 

? 5 Cr f37-1 pep CVHQDIHVRTWHSDAADI DTAPVPDWIETFACDLPENVLHI IRRHKPLWLNWEYLSAEE 

I I I M I I I I I I M It I I I I II I I I I I I I I I I M I I I I I I I I I ! I I I I I I I I 

0-f32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

30 130 140 150 160 170 180 

0 rf^-l .pep SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNAS 

I | t I I I I I i I I I : I I I I M ! I M M I I I I I I I I I I I I M I : I I I : I I I I I I I I 
o-~32a SNERLHXMPS PQESVXKXFWFMGFS EX SGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 

130 140 150 160 170 180 

35 

190 200 210 220 230 240 

o^3~'-I pep EWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 
I I M I I I I M I I I I M I I I I I I 1 I i : I I I I M : I I I I I M : I 1 I I I I It M I I II I I I I 
orf 32a EWLLFGYRSDVWAKWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFQTA 
40 190 200 210 220 230 240 

250 260 270 280 290 300 

o^f 32 -i .pep SVRLVKIPFV PQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
I | | | | || | I || | I I I j : II I I I I M I I I I I I II I I M I M I I I M I I I I I I I I I I II II I 
45 Q rf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22-1 . pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 
50 i I I i I I I I I i i I I I : I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I I M I I I I I M I I 

orf 32a AFWDKAHGF\'TPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 

310 320 330 340 350 360 

370 380 
55 orf 32- 1 . pep LFGQPSAPEKLAAFVSKHQKIRX 

I II 1 I I I I I I 1 I M I I I I M I I 
orf 32a LFGQPSASEKLAAFVSKHQKIRX 

370 380 

60 Homology with a predicted ORF from N. gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 

orf 32 pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

I I | I | | | I I I II i M I I I I I I I M 1 I II I I I I I I I M I I I I I I I I I I I M I I I 1 I 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-154- 



PCT/IB98/01665 



orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 60 

orf 32. pep DVPCVHQDIHVRTWHSDAADIDTA 81 

lit I I I I I I I M I I I I I I I I I I I 
5 orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 120 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

10 101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGD VWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

2 51 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 
301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

15 Further sequencing revealed the following DNA sequence <SEQ ID 195>: 

1 . ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

20 201 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

3 51 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

4 01 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 
25 4 51 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 

501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 

5 51 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG 
601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT 
651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg 

30 701 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 

7 51 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 
801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 

8 51 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 
901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 

35 951 GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 
1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 
1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 
1151 AG 

40 This encodes a protein having amino acid sequence <SEQ ID 196; ORF32ng-l>: 

1 MNTYAFPVCW I FCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

101 LNI IRRHKPL WLNWEYLSAE ESNERLHLMP SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

45 201 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

351 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap: 

50 10 20 30 40 50 59 

orf 32-1 .pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
III I I II II ! I II I I I I I 11 I I ! I I I I ! 1 I I I M I I M I I I I I I M 1 I ! I I I I I I I I 
orf 32ng-l ' MNTYAFPVCW I FCKV I DNFGD I GVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 

10 .20 30 40 50 60 

55 

60 70 80 90 100 110 119 

orf 32-1 . pep PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 
I I I I I I I I I I II II I I I I I I I I M I : I I I I I I I I I I I I I I : I I II U I I II I ! I I I I II 
orf32ng-l PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 
60 70 80 -90 100 110 120 

120 130 140 150 160 -170 179 
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ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
| | | | 1 | | | | I I I I I I J I 1 t I I I 1 I Mill I I I I I i I : II : M I I I I 



orf32no-l ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 
9 130 140 150 160 170 180 

5 180 190 200 210 220 230 239 

orf 32-1 oeD SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQI IDSLKQSGVIPQDALQNDGDVFQT 

' H | | | | | | | |: M | I I 1 i : II : i I I I I I M I I I : I I i t M I I I I I I I I : I I M : I Mil 

orf32na-l PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQI IDSLKQSGVI PQNALQNEGGVFQT 

1Q 9 190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 32-1 pep ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
I I I I || I M I I I II I II : I I I II M II I M II II II : I II II I M I I M I II II I I M II 
1 * orf32na-l ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 

250 260 270 280 290 300 

300 310 320 330 340 350 359 

orf 32-1 pep HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
20 " ^ * * I I M I II : M II I I I : I : M I I I II II I II II M II M II I I I I I I I I I M I I I II M I 

nrf32na-l HAFWDKAYG FYT PETAS VHRLLS DDLNGGEALS ATQRLECWQT LQQHQNGWRQGAEDWSR 

9 310 320 330 340 350 360 

360 370 380 

25 orf 32-1 .pep YLFGQPSAPEKLAAFVSKHQKIRX 

I M II II I I M I M 1 M I I I I I I 
orf32ng-l YLFGQPSASEKLAAFVSKHQK IRX 

370 380 

30 On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
35 7 A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST- fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 197>: 

1 . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

45 201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

2 51 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

4 01 TCGCCTGCTA NGGCATCCTG CCGCGCCTG.. 

50 This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRV7KIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYT FNWEST LLSNAASVRA VEMLAWLPSK 

101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL . . 
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Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

201 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

4 01 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

4 51 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

7 01 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

7 51 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAAA TCATCTTGAA 

951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG AC AG AG CT G A AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

12 51 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ED 200; ORF33-l>: 



1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFS VTYLL MPNQGLNFFL 

101 VLAGVLGMNT LMLAV WLAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

2 51 GSIACYGILP RLLA WWCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEW FEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of A'. 
meningitidis: 



10 20 30 

orf 33 . pep L FLRVKVGR FFS S PAT W FRXK D P VNQAVLR 

I I I t I I I I M II I I I I I I I I I I I M I I I I 
orf 33a LMDNQGLNF FLVLAGVXGMNTLMLAV WLAMLFLRVKVGRFFSSPATWFRGKDPVNQAVLR 
90 100 110 120 130 140 



40 50 60 70 80 90 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLSNAASVRA 

I I MM! MUM I I I I I I I I I I I I II I I I i I I I M I I I I I I I I I I I I I :::: I I I 
orf 33a LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLV RQYTFNWESTLLGDSSSVRL 
150 160 170 180 190 200 

100 110 120 130 140 

orf 33 . pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXG1LPRL 
I M I M M : II I M I I I M M I M M I M I II I I M I M i I II I I M M M 
orf 33a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLA WAVCK 
210 220 230 240 250 260 



orf33a 



ILXXTSENGLDLEKXXXXXXIRRWQMKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 • 300 310 320 



9924578A2 I > 



WO 99/24578 



-157- 



PCT/IB98/01665 



The complete length ORF33a nucleotide sequence <SEQ ID 201 > is: 



10 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



ATGTTGAATC 
AGGCGGCTTT 
GCCGCGTGGA 
ATCGACAGGA 
GTCGTTCTGG 
TTTCAGTTAC 
GTTTTGGCGG 
GGCAATGTTG 
CGACGTGGTT 
TATGCGGACG 
GTCGCACAGC 
TGTTGCTGCT 
TTGGGCGATT 
TGCGAAACTG 
GTCTGAACGG 
GGCAGTATCG 
ATGCAAAATC 
NCNNNNNTCN 
GATACGCGTC 
CGATGCGCCG 
AATGGTTCGA 
GCCAATCGGG 
GGCGCAACTG 
TGTTGCGGCA 
GTGCANCTTT 
GGAACATTGG 
CCGACAGAGC 



CATCCCGAAA 
ATTTTCAGCG 
CGGCAGTACG 
ACCGTATGCT 
TTGTGGGTGG 
TTATCTTCTA 
GCGTGNTGGG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCN 
CTGTGGCTCT 
TTTGGTGCGG 
CGTCTTCGGT 
GGTTTTCCCG 
CAATATTGCC 
CCTGCTACGG 
CTTNTGNAAA 
NNCGNTCATC 
GGGAAACCGT 
AAATGGGCGG 
GGGCIkGGCTG 
AACAGGTTGC 
CTTATCGGCG 
GAT CGTCCGA 
TGGCGGAACA 
CGTAACGCGC 
GGCGCAGGAA 



ACTGGTTGAG 
GCGATCCCGT 
GAGGAAAAAA 
GCGGGAGACG 
CGGCGGCGAC 
ATGGACAATC 
CATGAATACG 
TGAAAGTGGG 
GACCCTGTCA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGGCTGGTG 
TGCCTGATGC 
GATGCGCGGG 
CATCCTGCCG 
CAAGCGAAAA 
CGC.CGCTGGC 
GTCCGCCGTT 
TCATGCTGGA 
GCGCAGGAAT 
CGCGCTGGAG 
TGCGCGCCCA 
CTTTCGGAAG 
GGGGCTTTCA 
TGACCGAATG 
GGCCGTCTGA 



CTGGTCCGTA 
GCAGGCGACG 
TCATCCGTCG 
TTGGAACGTG 
GTTTGCGTTT 
AGGGTCTGAA 
CTGATGCTGG 
GCGTTTTTTC 
ATCAGGCGGT 
CGTTGGAAAA 
CGGAATGCTG 
TCAACTGGGA 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 
CGGCTTGGAT 
AGAACAAAAT 
TCGCCGAAAA 
GACCGAATGG 
GGCTGGATAA 
ACAGAGCTGA 
AACTGTGCCC 
CGGCGCAGGG 
GACGACCTTT 
CGGCGCGGCG 
AAACCAACGA 



TTTTGGAAGA 
GAGGCTTTGC 
GGCGAAGATG 
TGCGTGCGGG 
NTTACCGNTT 
TTTCTTTTTG 
CAGTATGGTT 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGCGGT 
TTGGAAAAGC 
CACCGATGCG 
TCGTCTTGAA 
CAGGACGGCG 
GGGCGTTGCC 
AGCAGAAACC 
GACCGCGGCG 
CGGCGCGGTG 
CGGAAAAGCT 
TGGCTGGAAC 
CCGCACTTGA 



This encodes a protein having amino acid sequence <SEQ ID 202>: 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MLNPSRKLVE 
IDRNRMLRET 
VLAGVXGMNT 



LVRILEEGGF 
LERVRAGSFW 
LMLAVWLAML 



IFSGDPVQAT 
LWVAAATFAF 



EALRRVDGST 
XTXFSVTYLL 



YADEWRXPSV 
LGDSSSVRLV 
GS I ACYGILP 



RWKIGATSHS 
EMLAWLPAKL 
RLLAWAVCKI 



FLRVKVGRFF 
LWLCTLLGML 



SSPATWFRGK 
VSVLLLLLVR 



DTRRETVSAV 
ANREQVAALE 
VXLLAEQGLS 



SPKIVLNDAP 
TELKQKPAQL 
DDLSEKLEHW 



GFPVPDARAV 
LXXTSENGLD 
KWAVMLETEW 
LIGVRAQTVP 
RNALTECGAA 



IEGRLNGNIA 
LEKXXXXXXI 
ODGEWFEGRL 
DRGVLRQIVR 
WLEPDRAAQE 



EEKIIRRAKM 
MDNQGLNFFL 
DPVNQAVLRL 
QYTFNWESTL 
DARAWSGLLV 
RRWQNKITDA 
AQEWLDKGVA 
LSEAAQGGAV 
GRLKTNDRT* 



ORF33a and ORF33-1 show 94.1% identity in 444 aa overlap: 



40 



45 



50 



55 



60 



65 



orf 33a . pep 
orf33-l 

orf 33a . pep 
orf33-l 

orf 33a . pep 
orf33-l 

orf 33a . pep 
orf33-l 

orf 33a. pep 



10 20 30 40 50 60 

MLNPSRKLVELVRILEEGGFI FSGDPVQATEALRRVDGSTEEKI I RRAKMIDRNRMLRET 
I I I M I I! I I I I I I I : M II I I! I I I M I I I ! I I I I i II I I I M M I I : I I I I I I I 1.1 I I 
MLN PSRKLVELVRI LDEGGFI FSGDPVQATEALRRVDGSTEEKI IRRAEMI DRNRMLRET 
10 20 30 40 50 60 

70 80 90 100 110 120 

LERVRAGSFWLWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMI^VWLAML 

I 11 I I I II I I I I I : I I I I I I 1 I I I I I I M I! 1 I f M II 1 I I I I I I I I I I I I I I I I 1 I 
LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 

70 80 90 100 110 120 

130 140 150 160 170 180 

FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 

II I I I I II I II II t I I I I I II I I I II I I I I I I I I I I I II M M I I I M M I I I II I I I I 
FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 

M 1 I I I I I I I I 1 I I M II I I I : : : : I I I I III II I 1 : I I I II I I I 11 I I I I M I I I I I I 
VSVLLLLLVRQYT FNWE S T LL SN AAS VRAVEM LAWL PS KLGFPVPDARAV IEGRLNGNIA 
190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I I I I I I I I I I I I I I I I I I I I I I I I : M I I I M I M Ml I! I I I I I | I I I I I 



BNSDOCID: <WO 9924578A2J_» 
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10 



15 



20 



orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf33a pep dtrretvsavspkivlndapkwavmletewqdgewfegrlaqewldkgvaanrecvaale 

| M | I I I! I I I I I! : I I II I I I I i I I I I I I I I M I I I I I I '•MINIMI 

orf33-l DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

or f 3 3a. pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 
| | | | | | I | | I || || I II M I I I I II II I II I I II M I I II I I M II I II I II I ! II M I 
or f 3 3-1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 

430 440 450 

orf 33a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I I I I : I M II II I ! II I I I I M II 
o r f 3 3 - 1 RN ALAECGAAWLE PDRAAQEGRLKDQX 

430 440 

Homology with a predicted ORF from N. gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N. 
gonorrhoeae: 



25 



30 



35 



orf 33 . pep 
orf 33ng 
orf33 .pep 
orf 33ng 
orf 33 . pep 
orf 33ng 



LFLRVKVGRFFS S PATWFRXKDPVNQAVLR 
I I M I II I M I M I II M I I I I I M II I 
LMDNQGLNFFLVLAGVLGMNTLMLAWJLATLFLRVKVGRFFSSPATWFRGKG PVNQAVLR 



30 



100 



90 



LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 
I | | : | i | I I I II I I : I I I I i I M M M I I M M I I I I I I I II M I II I I 1 I I M M 
LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 143 

I I I I I M I I I I I I I I II I I M I ! I M I I I I I M II II M I I 11:1 I I I I M 
VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 



An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 



acid sequence <SEQ ID 204>: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 



MIDRDRMLRD 
LVLAGVLGMN 



TLERVRAGSF_ 
TLMLAVWLAT 



WLWWVASMM FT AG FSGTYL 



LYADQWRQPS 
LLSNAASVRA 



VRWKIGATAH 
VEMLAWLPSK 



LFLRVKVGRF 
SLWLCTLLGM 



FSSPATWFRG 
LVSVLLLLLV 



VGSIVCYGIL 
ADTRRETVSA 
AANREQVAAL 
WQLLAEQGL 



PRLLAWWCK 
VSPKIVLNDA 
ETELKQKPAQ 
SDDLSEKLEK 



LGFPVPDARA 
ILLKTSENGL 
PKWALMLETE 
LLIGVRAQTV 
WRNALTECGA 



VIEGRLNGNI 
DLEKTYYQAV 
WQDGQWFEGR 
PDRGVLRQIV 
AWLEPDRVAQ 



LMDNQGLNFF 
KG PVNQAVLR 
RQYTFNWEST 
ADARAWSGLL 
IRRWQNKITD 
LAQEWLDKGV 
RLSEAAQGGA 
EGRLKDQ* 



Further sequence analysis revealed the following DNA sequence <SEQ ED 205>: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTTGaatC 
agggggtTTT 
gccgcgtgga 
atcgACAGGg 
gtcgtTctgG 
TTTCAGgcac 
GTTTTggcgG 
gGCAACGTTG 
CGACGTGGTT 
TATGCGGACC 
GGCGCACAGC 
TGCTGCTGCT 
TTGAGCAATG 
GTCGAAACTC 
GTCTGAACGG 
GGCAGTATCG 



CATCCCgaAA 
attttcagcg 
cggcAGTACG 
accgtatgtt 
TTATGGGTGG 
ttatCttCTG 
GAGTGTtggG 
TTCCTGCGCG 
TCGGGGCAAA 
AGTGGCGGCA 
TTGTGGCTCT 
TTTGGTGCGG 
CCGCTTCGGT 
GGTTTCCCTG 
CAATATTGCC 
TCTGCTACGG 



ACTGgttgag 
gcgatcctgt 
GAggAaaaaa 
gcgggACaCg 
TggtggCAtC 
ATGGACaatC 
CATGaatacG 
TGAAAGTGGG 
GGCCCTGTAA 
ACCTTCGGTA 
GCACGCTGCT 
CAATATACGT 
ACGCGCGGTG 
TCCCCGATGC 
GATGCGCGGG 
CATCCTGCCG 



ctGgTCCgtA 
gcaggcgacg 
tcttccgtcg 
TtggaacGTG 
gATGATGTtt 
AGGGGCtGAA 
CtgATGCTGG 
ACGGTTTTTC 
ATCAGGCGGT 
CGATGGAAAA 
CGGAATGCTG 
T C AACTGGG A 
GAAATGTTGG 
GCGGGCGGTC 
CTTGGTCGGG 
CGCCTCTTGG 



Ttttgaataa 
gaggctttgc 
GGCGGAGAtg 
TGCGTGCggg 
aCCGCCGGAT 
TtTCTTTTTA 
CAGTATGGtt 
AGCAGTCCGG 
GTTGCGGCTG 
TAGGCGCAAC 
GTGTCGGTAT 
AAGCACGCTG 
CATGGCTGCC 
ATCGAAGGTC 
GCTGCTGGTC 
CTTGGGTAGT 
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801 GTGTAAAATC CTTTTGAAAA 

851 CCTATTATCA GGCGGTCATC 

901 GATACGCGTC GGGAAACCGT 

951 CGATGCGCCG AAATGGGCGC 

1001 AATGGTTCGA GGGCAGGCTG 

1051 GCCAATCGGG AACAGGTTGC 

1101 GGCGCAACTG CTTATCGGCG 

1151 TGCTGCGGCA GATTGTGCGG 

1201 GTGCAGCTTT TGGCGGAACA 

1251 GGAACATTGG CGTAACG CGC 

1301 CTGACAGGGT GGCGCAGGAA 

This encodes a protein having amino acic 



CAAGCGAAAA CGGattgGAT TTGGAAAAAA 

CGCCGCTGGC AGAACAAAAT CACCGATGCG 

GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 

TCATGCTGGA GACCGAGTGG CAGGACGGCC 

GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

TACGCGCCCA AACTGTGCCG GACCGGGGCG 

CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

TGACCGAATG CGGCGCGGCG TGGCTTGAGC 

GGCCGTTTGA AAGACCAATA A 

sequence <SEQ ID 206; ORF33ng-l>: 



1 MLNPSRKLVE LVRILNKGGF IFSGDPVQAT EALRRVDGST EEKIFRRAEM 

51 IDRDRMLRDT LERVRAGS FW LWVWASMMF TAGFS GTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAV WLATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 

151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 

251 GSIVCYGILP RLLA WWCKI LLKTSENGLD LEKTYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

4 01 VQLLAEQGLS DDLSEKLEHW RN ALT E CG AA WLEPDRVAQE GRLKDQ* 

ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 



10 20 30 40 50 60 

orf 33-1 . pen MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 
I I I I I I I I I I It I II :: I M I I i 1 I I I 1 I I I I I I I ! M I I I I I I : I M M I I I : I i I I : I 
orf33ng-l MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEK I FRRAEM IDRDRMLRDT 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33-1 . pep LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 
I | | | || | I M I I I I : I : : I : I 1 I I I I I I I I I I I I I I I M I M I I I I I I I I I I I II I 
orf 33ng-l LERVRAGSFWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 33-1 . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKI GATS HS LWLCTLLGML 
I I I I I it 1 I I I I I I M I I I I M I I I I I I I I I I : I I I I I 11 I I I I I I : I I I I I I I I I 1 I I 
orf 33ng-l FLRVKVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 33-1 . pep VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

I I | I I I | I I II I II I I I I I I M I I I I I I M I I I I 1 I I I I I I II I 11 I I I I I I I i I I I II I 
orf 33ng-l VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 " 220 230 240 

250 260 270 280 290 300 

orf 33-1 . pep DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 
I I I M I I I I I I I I : I I II I I I I I 1 II I 1 I I I I I I I I I I I 11 I I I 11 I I I I I I II M I I I 
orf33ng-l DARAWSGLLVGSIVCYGILPRLLAWWCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 ' 290 300 



310 .320 330 340 350 360 

orf 33-1 . pep DTRRETVSAVSPKIILN DAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
I I I I 11 I I I M I M : I I I I I I I I : I I I I I I 1 I I : II I 1 I I I I I M I I I II : I I I I I I I i I 
orf33ng-l DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 ; 350 360 

370 380 390 400 410 420 

orf 33-1 . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 
I | I I I M I I I II I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I II I I I M 11 II II II I I I 
orf33ng-l TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 

370 380 390 '400 410 420 



orf33-l .pep 



430 440 
RNALAECGAAWLE PDRAAQEGRLKDQX 
I I I I : I I I I I I I I I I I : I 1 I I II I I II 



WO 99/24578 



-160- 



PCI7IB98/01665 



orf33ng-l RNALTE CGAAWLE P DR VAQEGRLKDQX 

430 440 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
5 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 207>: 

1 . . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 

10 51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC . GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG. . GTTTGA 

2 51 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

15 301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGT1 CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

4 51 GTCC. 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

20 1 . .QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 

51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 

101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 

151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 

25 1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

30 251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

301 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 

351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 

4 51 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

35 501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT AC AG AT GAG C AATACGGCGC G TAT C AG GAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

7 01 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 
40 7 51 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

8 51 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 
901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 
951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

45 1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGG GCTGACGCAG CCGCTAAGGG 

50 12 51 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 MMMPFIMLPW IAGVPA VPGQ NRLSR ISLWG LGGVFFGVSG LV WFSLG VSL 

55 51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 
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201 SLKGL FGFFA ILIVLL GCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 
251 GADFLGNLRL FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 
301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 
351 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNWVGLR AGGSAVDGGF 
5 4 01 RADGGAS DYC ADAAAKGKAE NGGNQGADGV RFGFHRVLPF LGVSDGIALR 

4 51 HAV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (s train A) 

ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 
10 meningitidis: 

10 20 30 

orf34 Deo OKSLSR ISLWGLGGVFFGVSGLV WFSLG VSXE CAC 

' F * II III 1 I I M 11 II II I M I I I M II It Ml 

or f 3 4 a MMXPXIMLPWIAGVPA VPGQKRLSR XSLWGLGGXFFGVSGLVW FSLG VSXSLGVSXGCAC 
j5 ^ 10 20 30 40 50 60 

40 50 60 70 80 90 

0^f34 oep FSGVSFRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

TTTTl II I! I I I I I I I I I I I I I I II I II : I : : : I : : I I I I I I 

20 orf34a rsGVSFRGSGRG TFVGSTGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 

70 80 90 100 110 

100 110 120 130 140 150 

or ^34 pec- ^gdvtllplssvpsgcagsdeaawwcsgwaascpttpfgsqnsvsrglsvccgsaxrvls 

?5 " * i il II I M I I I I I I I : I I I I I II M I I I I II I I I I II I M M I I I II I : M I I 

or f34a AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



30 



orf 34 . pep 
erf 34a 



P^XMVT.TMPIANAPMAVIOMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 



The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 



35 



40 



45 



50 



55 



60 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGATGATNC 

GCCGGGTCAG 

TGTTTTTCGG 

TCTTTGGGTG 

GGGTTCGGGA 

TGTTTTCAGC 

GTGTCGGCAG 

CGGCAGTCCG 

ATGAGGAGGC 

CCGTTTGGCA 

TTCGGTNTGG 

CTATTGCCAA 

ATCAGGAGTT 

TTTGATTGTG 

ACGGCATTGC 

TTTTTGTACG 

CTTCGGGGGT 

ATTTTGACGC 

GACTTTGGAT 

GCAGGGAGGC 

GAACGTGCAA 

TCCGAGCAGC 

TGTANCCTTT 

TCGATACGCA 

GCGGTCGACG 

TGACGCAGCC 

ACGGTGTGCG 

GACGGCATTG 



CGTTNATAAT 

AAGAGGTTGT 

GGTGTCCGGT 

TTTCTNTGGG 

CGGGGGACGT 

TTGTGCTCCG 

GTTGCGGTTT 

CTGCCGCTTT 

GTNGTNGTGT 

GCCAGAATTC 

AGGGTTTTGT 

TGCGCCGATG 

TGGGGGTCAG 

CTTTTGGGGT 

CGAGTCAGCG 

CCGACGGTGG 

GAGGATGCCC 

GCGCCTGTGT 

GTGTTCCAAG 

GACGGTAATG 

TCTGACCGAC 

AGCAGGTGGC 

GGTTTGGTTG 

GCGCCATTAC 

GCGGATTTCG 

GCCGAGGGCA 

GTTTGGGTTT 

CTTTGCGCCA 



GCTTCCTTGG 

CGAGAANTTC 

TTGGTATGGT 

CTGTGCCTGT 

TTGTGGGCAG 

GCGTCGTCCG 

GACCCGGNTT 

CGTCTGTGCC 

TCGGGTTGGG 

GGTTTCGCGG 

CNCCGTTCGG 

GCGGTGATAC 

CCTGAAGGGT 

GTCGGGCAAT 

TTGGACGTAG 

TGCTGACTTT 

ATAACGTAGG 

GGCGGGGCTG 

TGTCGCCGGC 

TANTTGTACA 

GAACTGTTTC 

GGTTGTAGCC 

TTTTGGCGCA 

GTTGTCGTCG 

CGCCGACCGC 

AGGCTGAGGA 

CATCGGGTGC 

TGCCGTCTGA 



ATTGCGGGTG 
TTTATGGGGT 
TTTCTTTGGG 
TTTTCGGGTG 
TACNGGGGTT 
GCTGCCTGTC 
TTCTTNGGTG 
GTCCGGCTGT 
CGGCATCTTG 
GGGCTGTCGG 
GTNGAATGTG 
AGATGAGCAA 
TTGTTCNGTT 
GCCGTCTGAA 
TTTNGGTAGA 
TTGGGTAATC 
TTACGTTGCC 
ATGCCCAACA 
GATGTCGCCG 
CGCCTTCGGC 
TCGCCTTCGG 
GACAACGGAG 
GATAGGAGCG 
GTTNGCGCGC 
CGCGCCGCCG 
CGGCGGCAGT 
TTCCTTTCTT 



TGCCTGCCGT 

TTAGGCGGCN 

CGTTTCTNTT 

TTTCTTTTCG 

TCTTTGAGTG 

GGTTTNAGCT 

CGGCAGGGGA 

GCGGGTGCGG 

TCCGACTACG 

TGTGTTGCGG 

CTGACGATGC 

TACGGCGCGT 

TTTTTGCCAT 

GGCGGTTCAG 

GGGTGATGAC 

TGCGCCTGTT 

GTAGGTAACG 

GCGTGGCGCG 

GCAGTGCGCG 

GGCCTGTTCG 

TGGCGACTTG 

ATTTGGGGCG 

GGCGGTGGTT 

CGGTGGTTCG 

ACGACTGCGC 

CAGGGTGCGG 

GGGCGTTTCA 
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This encodes a protein having amino acid sequence <SEQ ID 212>: 



10 



1 MMXPXIMLPW IAGVPA VP6Q 

51 SLGVSXGCAC FSGV SFRGSG 

101 VSAGCGLTRX FXGAAGDGSP 

151 PFGSQNSVSR GLSVCCGSVW 

201 IRSL GVSLKG LFXFFAILIV 

251 FLYADGGADF LGNLRLFFGG 

301 DFGCVPSVAG DVAGSARQGG 

351 SEQQQVAWA DNGDLGRVXF 

401 AVDGGFRADR RAADDCADAA 

4 51 DGIALRHAV* 



KRLS RXSLWG LGGXFFGVSG 
RG TFVGSTGV SLSVFSACA P 
LPLSSVPSGC AGADEEAXXC 
RVLSPFGXNV LTMPIANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYVA VGNDFDARLC 
DGNVXVHAFG GLFGTCNLTD 
GLWLAQIGA GGG F DTQRHY 
AEGKAEDGGS QGADGVRFGF 



LVWFSLG VSX 
ASSGCLSVXA 
SGWAASCPTT 
AVIQMSNTAR 
LDWXVEGDD 
GGADAQQRGA 
ELFLAFGGDL 
VWGXRAGGS 
HRVLPFLGVS 



ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



10 20 30 40 50 60 

or f 34 a pep MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 
1 J | | | I I I II i I I I I I M : M II I I I I I I I I i I I I I I ! I 1 I I I I i I I I I 

orf 34-1 MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 

70 80 90 100 110 120 

orf 34a pep FSGVSFRGSGRGT FVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRXFXGAAGDGSP 

I I M I I i I M I I M I t I I I I I I I I I I I i : I I I I I I I I I i M I I I I I II I I MINIM 
0*^34-1 FSGVSFRGSGRGT FVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 34 a. per> LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 
I M I I I II I 11 I : I I I M I II I II I II II I I II II I I I I I I I I I M I I M M I M 
orf 34-: LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
120 130 140 150 160 170 

190 200 210 220 230 240 

orf 34a. peo LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 
I I I || I } I M I : I M I I I 1 M M M I I I I I I I I I I M I I ! I I II M M I II I M I I II I 
orf 34-1 LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 

250 260 270 280 290 300 

orf 34a . Deo LDWXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
MM I II I M M M I I 11 M M M 11 I I II II II I I 1 I M I I 1 II II M II M I M II I 
orf34-l L DW LVEG DD FLY ADGGADFLGNLRL FFGGE DAHN VG YVAVGND FD ARLCGGADAQQRGA 

240 250 260 270 280 290 

310 320 330 340 350 360 

orf 34a . oep DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 
M I I I II II M I M II M M I II : M I I M I I I I II I I I I M : I I M M M II I I I M I 
orf 34-1 DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 34a. peD DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 
I : I II i I I M I M II 1 M : I I M II I 11 I I I I I I I I 1 I I I II I M I Ml II 11 I 
orf 34- i DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 
360 370 380 390 400 410 

430 440 450 460 

orf34a,pep AEGKAE DGGS QGADG VR FG FHRVL P FLGVS DG I ALRHAVX 

I : I I I I : I I : II II II I I I I 11 M M II I I M I II I M I I 
orf 34- I AKGKAENGGNQG ADGVRFG FHRVLP FLGVS DG I ALRHAVX 

420 430 440 450 

Homology with a predicted ORF from N. gonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from K 
gonorrhoeae: 

orf34.pep QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 
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10 



orf 34ng 
orf 34 . pep 
orf 34ng 
orf34 .pep 
orf 34ng 
orf 34 .pep 
orf 34ng 



II I I I I I I I I I : I I I I I I I M I I I I I I I I 111 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 



FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV 

M | ! M II I I I : I II M I I I I I II M I I : I I 

FSGVSFRGSGWGAFVGSTGVSLSVFSACVP 



60 



90 



- - GRLXX LT R F FLG A 
II I I I I I II I 
VPVNESAARAASEGR — GLTRFFLGA 114 



AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

Mi | | | | | | | M I I II I I I I II I II I I I I I I I : M I I I I II I I I I I I M I I : MM 
AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 



150 



174 



S l* 75 
PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 



1 5 The complete length ORF34ng nucleotide sequence <SEQ ID 213> is: 



20 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGATGATGC 
GCCGGGTCAA 
TGTTTTTCGG 
TCTTTGGGTG 
GGGTTCGGGA 
TGTTTTCAGC 
GCATCCGAAG 
CGGCAGTCCG 
ATGAGGCGGC 
CCGTTTGGCA 
TTCGGTTTGG 
CTACTGCCAA 
ATCAGGAGTT 
TTTGATTGTG 
ACGGCATTGC 
TTTTTGTACG 
CTTCGGGGGT 
ATTTTGACGC 
GACTTTGGAC 
GCAGGGAGGC 
GAACGTGCAA 
TCCGAGCAGC 
TGTAGCCTTT 
TCGATACGCA 
gCGGTCGATG 
TGAAGCAGCC 
ACGGTGTGTG 
GACGGCATTG 



CGTTCATAAT 
AAGAGGTTGT 
GGTGTCCGGT 
TTTCTTTGGG 
TGGGGGGCGT 
TTGTGTTCCG 
GGCGCGGTTT 
CTGCCGCTTT 
GTGGTGGTGT 
GCC AG AATTC 
AGGGTTTTGT 
TGCGCCGATG 
TGGGGGTCAG 
CTTTTGGGGT 
CGAGTCAGCG 
CCGAcggTGG 
GAGGATGCCC 
GCGCCTGTGT 
GTGTTCCAAG 
GACGGTAATG 
TCTGACCGAC 
AGCAGGTGGC 
GGTTTGGTTG 
ACGCCATAAC 
ACGGATTTTG 
GCCGAGGGCA 
GTTTGGGTTT 
CTTTGCGCCA 



GCTTCCTTGG 
CGAGAATCTC 
TTGGTATGGT 
CTGCGCCTGT 
TTGTGGGCAG 
GTGCCGGTTA 
gACCCGGTTT 
CTTCTGTGCC 
TCGGGTTGGG 
GGTTTCGCGG 
CGCCGTTCGG 
GCGGTGATAC 
CCTGAAGGGT 
GTCGGGCAAT 
TTGGACGTAG 
TGCTGACTTT 
ATAACGTAGG 
AGCGGGGCTG 
TGTCGCCGGC 
TAGTTGTATA 
GAACTGTTTT 
GGTTGTAGCC 
TTTTGGCGCA 
GTtgtCATCG 
CGCCGACGGC 
AGGCTGAGGA 
CATCGGGGAC 
TGCCGTCTGA 



ATTGCGGGTG 
TTTATGGGGT 
TTTCTTTGGG 
TTTTCGGGTG 
TACGGGGGTT 
ACGAATCGGC 
TTCTTGGGTG 
GTCCGGCTGT 
CGGCATCTTG 
GGGCTGTCGG 
GTTGAATGTG 
AGATGAGCAA 
TTGTTCGGTT 
GCCGTCTGAA 
TTTTGGTAGA 
TTGGGTAATC 
TTACATTGCC 
ATGCCCAGCA 
GATGTCGCCC 
CGCCTTCGGC 
TCGCCTTCGG 
GACGACGGAG 
GGTAGGAACG 
GTTtgcgcgc 
GGCCCCGCCG 
CGGCGGCAAT 
TTCCTTTCTT 



TGCCTGCCGT 
TTGGCCGGCG 
CGTTTCTTTT 
TTTCTTTTCG 
TCTTTGAGTG 
TGCCCGGGCC 
CGGCAGGGGA 
GCGGGTTCGG 
TCCGACGGCG 
TGTGTTGCGG 
CTGACGATGC 
TACGGCGCGT 
TTTTTGCCAT 
GGCGGTTCAG 
GGGTAATGAC 
TGCGCCTGTT 
GTAGGTAATG 
GcgtgGCGCG 
GCAGTGCGCG 
GGCCTGTTCG 
TGGCGACTTG 
ATTTGGGGCG 
GGCGGTGGTT 
CGGTGGTTcg 
ACGACTGCGC 
CAGGGTGCGG 
GGGCGTTTCA 



This encodes a protein having amino acid sequence <SEQ ID 214>: 



45 



50 



1 MMMPFIMLPW IAGVPA VPGQ 

51 SLGVSLGCAC FSGV SFRGSG 

101 ASEGRGLTRF FLGAAGDGSP 

151 PFGSQNSVSR GLSVCCGSVW 

201 IRSLrG VSLKG LFGFFAILIV 

251 FLYADGGADr LGNLRLFFGG 

301 DFGRVPSVAG DVARSARQGG 

351 SEQQQVAWA DDGDLGRVAF 

4 01 AVDDGFCADG GPADDCAEAA 

4 51 DGIALRHAV* 



KRLSRI SLWG LAGVFFGVSG 



WG AFVGSTGV SLSVFSACV P 
LPLSSVPSGC AGSDEAAWWC 
RVLSPFGLNV LTMPTANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYIA VGNDFDARLC 
DGNVWYAFG GLFGTCNLTD 
GLWLAQVGT GGGF DTQRHN 
AEGKAEDGGN QGADGVWFGF 



LWFSLG VSF 
VPVNESAARA 
SGWAASCPTA 
AVIQMSNTAR 
LDVVLVEGND 
SGADAQQRGA 
ELFFAFGGDL 
WIGLRAGGS 
HRGLPFLGVS 



55 ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 



60 



65 



orf 34-1 . pep 
orf 34ng 

orf 34-1 .pep 
orf 34ng . 



10 20 30 40 4 50 
MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

| | | I || II I ! II I I M M II : I I I II M Ml : I M M I M 1 I M I I I M Mill 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 

10 20 30 40 50 60 

60 70 80 90 100 110 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 

| | || M M I I I: I M II I M I II I I II I I : • ■ ■ Ml I II 11 II ! II I II I II 
FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 
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70 80 90 100 110 120 

120 130 140 150 160 170 

orf 34-1 . pep LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
5 I I I I I M t I I I I I I I I M I I M I I 11 I I I : I II I I I I I M I I I I i I I I : I M I I I I I I I 

orf34ng LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 

180 190 200 210 220 230 

10 orf 34-1 . pep LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

I M I I ! M II : I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 1 I I II I I I I I I ! I I I I I I 
orf34ng LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

190 200 210 220 230 240 

15 240 250 260 270 280 290 

orf 34-1 . pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
I I ! I I M I : t I I I II I I I I I i I I I I I I I I I I I I I I M I : I I I I I I I It I I : I I I I I I M I 
orf34ng LDWLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 

250 260 270 280 290 300 

20 

300 310 320 330 340 350 

orf 34-1 . pep DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
111 I I I I I I II I I I I I I I I 11 : I I : I I I I I I I I I M I I II I I I II I I I M II I I I I I I 
orf34ng DFGRVPSVAG DVARSARQGGDGNWVYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

25 310 320 330 340 350 360 

360 370 380 390 400 410 

orf 34 - 1 . pep DDGDLGRVAFGLVVLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 
I II II I I I I II I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I : I 11:11 
30 orf3 4na DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 

370 380 390 400 410 420 

420 430 440 450 

orf 34-1 . pen AKGKAENGGN QG ADGVRFG FHRVLPFLGVS DG I ALRHAVX 
35 i : I I I I : i I ! I ! II I I I I ! II I I I I 1 I II I I I I II 1 I I 

orf34nc AEGKAEDGGNQGADGVW FGFHRGLPFLGVSDG I ALRHAVX 

430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
40 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 26 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 215>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

45 SI CGCCGCCTGC GGATT . CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGJAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGTCAAAGA ACAAAT CCAA GCCGAGCTGG 

2 01 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

2 51 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

50 This corresponds to the amino acid sequence <SEQ ID 216; ORF4>: 

1 MKTFFKTLSA AALALILAAC G.QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

55 51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

2 01 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 
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251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG AC AT C ACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

5 4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT G AAAAAC AT C AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

.601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

10 701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AG ACAG CCAA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

15 1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

20 251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of N. 
meningitidis: 

25 10 20 30 40 50 59 

or f 4 . pen MKTFFKTLSAAALALILAA CG-QKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
I M I I I i II II I I I I I I I I I I I I I I I I I M 1 M I I ! I I 1 I I I I I I I I I I I II I! I I I 1 
orf 4a MKTFFKTLSAAALALILAA CGGQKDSAPAASASAAADNGAAXKEIVFGTTVGDFGDMVKE 

10 20 30 40 50 60 

30 

60 70 80 90 

or f 4 . pep QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
I I I I I M I I I I I I I I Mill I I I i 1 1 i I I 
orf 4a XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 
35 70 80 90 100 110 120 

orf 4a VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 21 9> is: 

40 1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

45 251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 

50 501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 

551 NNNNANNNN? NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 

601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

7 01 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

55 7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 

1 MKTFFKTLSA AALALILAAC GGQKDSAPAA SASAAADNGA AXKEIVFGTT 
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51 VGDFGDMVKE XIQPELEKKG 

101 XXYLDDXKKX HNLDITXVXQ 

151 PXXFXRVLVM LDELGXIKLK 

201 XXXXAXXXXX XXXXXXXXXS 

251 WLKDVTEAYN SDAFKAYAHK 



YTVKLVEXTD YVRXNLALAE GELDINYXQH 
VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 
DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 
GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 
RFEGYKSPAA WNEGAAK* 



A leader peptide is underlined. 



Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ID 22 1>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

10 101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

15 351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 

551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

20 601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AG AC AG C C AA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

25 851 GCGCAGCCAA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

30 151 PSNFARVLVM LDELGW1KLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSPADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 

10 20 30 40 50 60 

35 orf 4a-l MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

M II II I I M I II I I I II I I I I I I I I I M I II I I I I M I M ! I M I 1 I I M I I I M I I I I 
orf 4-1 MKT FFKTLSAAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I VFGTT VGDFGDMVKE 

10 20 30 40 50 60 

40 70 80 90 100 110 120 

orf 4a-l QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
IN M i I 1 M I i I I I I I I ! M I I I I I I I I 1 I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 
orf 4-1 QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

70 80 90 100 ' 110 120 

45 

130 140 150 160 170 180 

orf 4 a- 1 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
I i | | M I I I I I I I I I I M I I II I I I I I I I I M I I I I I I II I I ft I 11 I I 1 I I II M I I I I 
orf 4-1 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
50 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 4 a- 1 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
I I M I I I I I 11 I I I I I I I I I I I I I I I I M I I I I I I I 1. 1 I I II I I I I II I I I II I I I I I I I 
55 orf 4-1 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEAL FQEPSFAYVNWS 

190 200 210 220 230 . 240 

■250 260 270 280 

orf4a-l AVKTADKDSQWLKDVTEAYNS DAFKAYAHKRFEGYKS PAAWNEGAAKX 

60 II I I I II I I II II I I I I II I I I I I I M I I I II I I I I I I I M I I I I I I I 

orf 4-1 AVKTADKDSQWLKDVTEAYNS DAFKAYAHKRFEGYKS PAAWNEGAAKX 

250 260. 270 . 280 
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Homologv with an outer membrane protein of Pasteurella haemoliiica (accession q08869). 
ORF4 and this outer membrane protein show 33% aa identity in 91aa overlap: 

10 20 

lin? n „ ha MN FKKLLG VALVS ALALT ACKDEKAQAP 

5 P P I I I :: I I 111:11 : I : I 

ORF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL — ALILAACGFKKTARPPHPL 

110 120 130 140 150 

30 40 50 60 70 80 

1 0 liD2 pasha -ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 

1U P F : :: j : |: :| ::|:: :: III I : I I : I I : I : : I I I I : 

ORF4 LPPPTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGE 
160 . 170 180 190 200 .210 

15 90 100 110 120 130 140 

lip2 . pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLVWPIAAYSKKIKNISELKDGATVAIPNNAS 

I 

ORF4 L 



20 Homology with a predicted ORF from N. gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 
gonorrhoeae: 



orf 4mn. pep 



10 20 30 

MKT FFKTLSAAALALI LAACGXQKDSAPAA 
25 " L I I I I I i I I I : i : I I I I t II i I I I I I I I I I 

orf 4n RANAVXT PN PDGRT PCLS FLFETATTSGENMKT FFKTLST AS LALI LAACGGQKDS APAA 

200 210 220 230 240 250 

40 50 60 70 80 89 

30 orf4nm peo SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVE FTDYVRPNLALA 

| | : | : | I M I I I M I I I 1 I i I M I I I I I I I I I M I I I I M I I I I M I I H 

o y f 4 nc S AAAPSADNGAAKKE I VFGTTVGDFGDMVKEQI QAELEKKGYTVKLVE FTDYVRPNLALA 

260 270 280 290 300 310 

35 90 

orf4nm.pep EGEL 
I I I I 

orf 4nc EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 

320 330 340 . 350 360 370 

40 The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode a 
protein having amino acid sequence <SEQ ID 224>: 

1 MKTFFKTLST ASLAL ILAAC GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

45 151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

50 51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT T G G AC AT C AA CGTCTTCCAA 

55 301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACAT CAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

4 01 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

4 51 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

60 551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 
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601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

651 CGCCATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 

701 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 

7 51 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 
5 801 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 

8 51 AAGGCGCAGC CAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

10 101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1: 

15 10 20 30 40 50 59 

orf 4-1 . pep MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
M I ! I I I I M I I I I I I I I I II I I M I I I I If I : I : I i I I I I I M I I I M II I I I I II I I 
orf4ng-l MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 

10 20 30 40 50 60 

20 

60 70 80 90 100 110 119 

orf 4-1. pep EQIQAELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 

I I I I I I I I I I I I I I I I I I I I M I M M It I I I I I I It I I I 1 I I I I I i I I I I I II I I I I : I 
orf 4ng-i EQIQAELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

25 70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 4-1. pep QVPTAPLGLY PGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDG IN PLTAS 

I I I I I I I I It I 1 I I i i 1 I I I I I M I I I I I I t I I I t I I : I I I I : I I I I I I I I I I I I I I I I I 
30 orf 4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 4-1 . Dep KADIAENLKN IKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQE PS FA YVNW 

35 I I I 1 I I I I II I I I I I I I II t I I I II I II I I I I I I I I I I I I I I I I I M M I I I I I I II I I I 

orf 4ng-l KADIAENLKN IKI VELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPS FA YVNW 

190 200 210 220 230 240 

240 250 260 270 280 

40 orf 4-1 . Dep SAVKTADKDS QWLKDVTEAYNSDAFKAY AH KRFEGYKSPAAWNEGAAKX 

I 1 I M I I I I I I I I I I ! I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I 
orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 

45 In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 

ID LIP2_PASHA STANDARD; PRT; 276 AA. 

AC Q08869; 

DT 01-NOV-1995 (REL. 32, CREATED) 
50 DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 

SCORES Initl: 27 9 Initn: 416 Opt: 4 94 

^ Smith- Waterman score: 494 ; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

orf 4ng-i . pep MKTFFKTLSAAAL— ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 
I I I : : I I f I 1*11 : I : I I I : : I : : : I I I I : : I : : I 

lip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

60 10 20 30 40 50 

60 70 80 90 100 110 

orf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 
: : : : II I . I : 1 ! : I I : I : : M I I :.! I I.: I I 111:: I : : : : : 
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liD2 oasha tevavkiakekygldvelvqfteytqpnaalhskdldanafqtvpyleqevkdrgyklai 

60 70 80 90 100 110 

120 130 140 150 160 170 

5 orf4nq-l pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

:::!::! I : I : : I : I I I : I I : I I : I I I 1 M : : I : I : I I I I I : 

1^2 pasha igntlvwpiaayskkikniselkdgatvaipnnasntarallllqahgllklkdpbcn-vf 

P - 120 130 140 150 160 170 

10 180 190 200 210 220 230 

orf 4 ng-l pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE—ALFQEPSFA 

I : : I I II I II I M : : : : I I I I : : I I : I : : I 1 : : I : : : : : : 
lio2 Pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 

180 190 200 210 220 230 

15 

240 250 260 270 280 • 289 

orf4nq-l pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

Ml : : : t I : I : ::::::: I I I : I 

li P 2 pasha YVNLVVSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGWKGW 
20 " 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteur -ella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in Kcoli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8 A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
30 proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

45 4 01 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

4 51 CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 

551 GCCATGATGG AAGCCATGCA GCACGCCCAC ■ CGTAAAAT.CG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
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651 
701 
751 
801 



GCAGCGAAAT CCGGCTGCTT 



CATCGACACC 
ACTACCAATG 
8 51 ATTTCCGCCC 
901 TGCCCACGAA 



GCCATCAACC 
GCAGGGCTTC 
TCGTCATCCT 
CGCCAACACC 



GACCGCCACT 

GC 

CCGAACTGGA 
CTCTGGCTCA 
GCTGCAACGC 
TGCGCCAAAG 



TCACACTGCT 
AGACACGCCC 
AGCCCTCGCC 
GCACCGATAT 
ACCCGCCGCA 
CCTGCTTGA 



CCAAAC .... 
GCCGCATCCG 
GAACACCTCC 
GCGTCAGGAA 
AATGGCTGGA 



This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 



10 



1 PRRP 

51 QPPLLPHRRH 

101 HARHERPHRR 

151 AHDPRTPRGE 

201 XNRQHHRAAP 

251 TRPPHPHRHR 

301 PPQMAGCPRT 



RHAPVSRGDL 
GKRTGRLGGG 
GHRHRRRQTA 
HGENAPNQRT 
DHRRQAAI SQ 
HQPRTGSPRR 
PTPAPKPA* 



LQGGGTYARH 
RQKRLRPXAG 
AAEIHTDVAF 
HGQKPQPSRR 
TQRQRNPAAX 
TPPLPMAGLP 



GHRAGRGFGR 
RADDVYAHRR 
HACRQPGRLQ 
HIGRKLHQPR 
PPLHTAPN . . 
LAQHRYASGN 



FMAE PAL FPR 
QRQRMARQRT 
QNDCRNQQRQ 
HDGSHAARPP 

Q 

FRPRHPAATH 



15 Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 

20 Homology with a predicted ORF from N. gonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORFS.ng) from N. 
gonorrhoeae: 

MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 50 

i I I II I M I MM M M M : M M M M M M II I 1 M M 
PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAEPALFPR 4 4 

QPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 100 
I I M M M I I M I M I M M M I II I I I I I : I I I I I I I I I I I I I 
QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 94 

DARDER PHRRRHRHCRRQT AAAE I HT DVAFHACRQPGRLQQN DCRNQQRQ 150 

I I MMM III M II I M M II M I M II I I M I I M I I I I M I I 
HARHERPHRRGHRHRRRQTAAAEIHTDVAFHACRQPGRMQQNDCRNQQRQ 14 4 

AYDART FGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 200 
1:1 I I I M : I I I I M I I M M M I I M M I Mi M I i 11 II M 
AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 194 

QNRQHHRAAP DHRRQAAI SQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 250 
I M I M M I I I M ! I I I I I II I I M I I I M I I II M I 
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orf 8ng 
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orf 8 . pep 


245 




orf 8ng 


301 




orf 8 .pep 


295 



I II I II M I II M II I M M M M I I II I I.I 1 I .1 I I I II I I I I I Mi 



319 



I I I M II I I I I I M I I I I I 



50 The complete length ORF8ng nucleotide sequence <SEQ ED 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDER PHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

55 151 AY DART FGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 
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201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 
251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 
301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 23 1>: 



10 



15 



20 



i 

51 
101 
151 

201 
251 
301 
351 
401 
451 
501 
551 



. GAAATCAGCC 
GGATTCGGAA 
GGGCGTGGGT 
CGCGATTTGT 
TGTCCGCATC 
TGCAGGAACA 
GCTTT . GGCA 
CCGCTGGTTC 
TCGTCGTCAG 
GGACATTATC 
AGAATCGCTC 
GTTATCCTTT 



TGCGGTCCGA 
CGTTTTCTGC 
GGAAAACGGC 
CGCCTTTGGG 
GTCGGTTGCG 
GCTCGCCCGA 
TACGCAACCA 
AACGCCTTGG 
TTGCGGCACG 
TCGGAGA . GG 
GCCGTCCGAA 
CCCGACCGG. . 



CNACAGGCCG 
TGTTGGACGG 
ACGTTCGCAA 
CGCGGAGTGG 
CTGTGTGCGG 
AAAATCGAGT 
CTACCGCCAC 
GCAGCCGCCG 
GCGGTAACGG 
AAC CAT CAT G 
CCGCCAACCT 



GTTTCCGTGN 
CGGCAACAGC 
CCGTCGGTAG 
GCGGAAAAGG 
AG AATT C AAA 
GGCTGCCGTC 
CCCGAAGAAC 
CTTCAGCCGC 
TTGACGCGCT 
CCCGGTTTCC 
CAACCGGCAC 



CGAAGCGGCG 
CGGCTCAAGT 
CGCGCCGTAC 
CGGATGGAAA 
AAGGCACAAG 
TTCCGCACAG 
ACGGTTCCGA 
AACGCCTGCG 
CACCGATGAC 
ACCTGATGAA 
GCCGGTAAGC 



25 



This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 . EISLRSDXR? VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 

51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 

101 AXGIRNHYRK PEEHGSDRWF NALGSRRFSR NACWVSCGT AVTVDALTDD 

151 GKYLGXG7IM PGFKLMKESL AVRTANLNRH AGKRYPFPT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 



30 



35 



40 



45 



50 



55 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



ATGACGGTTT 

CGGTTTGCCG 

CGCAGCAGCT 

CTGTTGCGCC 

TTTCGATGCC 

CGGCATTGAA 

GCGCGGATTG 

GCAAAGTAAG 

GCGAGTGTCT 

GAGTTGGGTT 

GTCGCGTTTA 

TCGGACGCGA 

GGCAAAACGG 

GGAAGTAGAA 

GGCGGGGCAA 

CTGGACGCGG 

GGCGGAATAT 

TGCGCGACGG 

CAAGGCGTTT 

CGGCGAAATC 

GGCGGGATTC 

AAGTGGGCGT 

GTACCGCGAT 

GAAATGTCCG 

CAAGTGCAGG 

ACAGGCTTTG 

CCGACCGCTG 

TGCGTCGTCG 

TGACGGACAT 

AAGAATCGCT 

CGTTATCCTT 

GGATGCGGTT 

AAACCGGGGC 



TGAAGCTTTC 

CAACACGTCT 

CAACGGTTTT 

AACACGACGG 

GAAGGTTTGC 

GCACGAGTGC 

CGCCGGACAA 

GGCAGGGGGC 

GATGTTCAGT 

CGCTGTCGCC 

GGTTTGGATG 

CAAATTGGGC 

TTGCCGTGGT 

AATGCCGCTT 

TGCCGATGCC 

TGTTGTTGCA 

CAGGCTGCCA 

CGAAACCGTG 

TGCACTTGGA 

AGCCTGCGGT 

GGAACGTTTT 

GGGTGGAAAA 

TTGTCGCCTT 

CATCGTCGGT 

AACAGCTCGC 

GGCATACGCA 

GTTCAACGCC 

TCAGTTGCGG 

TATCTCGGGG 

CGCCGTCCGA 

TCCCGACCAC 

TGCGGCTCGG 

GGGCAAGCCT 



GCACTGGCGG 

CGCAACTGGC 

TGGCAGCAGA 

CTATTGGCGG 

GCGAGCTGGG 

GCGTCCAGCA 

GGCGCACAAA 

GGCAGGGGCG 

TTTGGCTGGG 

TGTTGCGGCA 

TGCAGATTAA 

GGCATTCTGA 

CGGTATCGGC 

CCGTGCAATC 

GCCGTGCTGC 

ATATGCGCGG 

ACCGCGACCA 

TTCGAAGGCA 

AACGGCAGAG 

CCGACGACAG 

CTGCTGTTGG 

CGGCACGTTC 

TGGGCGCGGA 

TGCGCTGTGT 

CCGAAAAATC 

ACCACTACCG 

TTGGGCAGCC 

CACGGCGGTA 

G AAC CAT CAT 

ACCGCCAACC 

AACGGGCAAT 

TTATGATGAT 

GTCGATGTCA 



GTGTTGGCGG 

GCGTATGGCG 

TGCCGGCGCA 

CTGGTGCGCC 

GGAAAGGTCG 

ACGACGAGAT 

ACCATATGCG 

GAAGTGGTCG 

TGTTTGACCG 

GTGGCGTGTC 

GTGGCCCAAT 

TTGAAACGGT 

ATCAATTTTG 

GCTGTTTCAG 

TGGAAACGCT 

GACGGATTTG 

CGGCAAGGCG 

CGGTTAAAGG 

GGCAAACAGA 

GCCGGTTTCC 

ACGGCGGCAA 

GCAACCGTCG 

GTGGGCGGAA 

GCG GAG AATT 

GAGTGGCTGC 

CCACCCCGAA 

GCCGCTTCAG 

ACGGTTGACG 

GCCCGGTTTC 

TCAACCGGCA 

GCCGTCGCCA 

GCACGGGCGT 

TCATTACCGG 



AGCTTGCCGA 

GATATGAAGC 

CATACGCGGG 

CATTGGCGGT 

GGTTTTCAGA 

ACTGGAATTG 

TGACCCACCT 

CACCGTTTGG 

GCCGCAGTAT 

GGCGCGCCTT 

GATTTGGTTG 

CAGGACGGGC 

TCCTGCCCAA 

ACGGCATCGC 

GTTGGTGGAA 

CGCCTTTTGT 

GTATTGCTGT 

CGTGGACGGA 

CGGTCGTCAG 

GTGCCGAAGC 

CAGCCGGCTC 

GTAGCGCGCC 

AAGGCGGATG 

CAAAAAGGCA 

CGTCTTCCGC 

GAACACGGTT 

CCGCAACGCC 

CGCTCACCGA 

CACCTGATGA 

CGCCGGTAAG 

GCGGCATGAT 

T T G AAAG AAA 

CGGCGGCGCG 
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1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 
17 51 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>: 

5 1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

2 01 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

10 2 51 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL 'GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

4 51 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

]5 501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-L Further 
computer analysis of this amino acid sequence gave the following results: 
Homology with the baf protein of B. pertussis (accession number U 12020). 
20 ORF61 and baf protein show 33% aa identity in 166aa overlap: 



25 



orf 61 


23 


baf 




orf 61 


78 


baf 


63 


orf 61 


132 


baf 


123 



■fL+D GNSRLK W + + A AP DL LG A R +G V G 

ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 13: 

+ + L I WL + A G+RN YR+P++ G+DRW L + 



30 +V S GTA T+D + D + G G I+PG +M+ +LA TA+L 

baf 123 PLLVASFGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of N. 
35 meningitidis: 

10 20 30 

orf 61 pep ' EISLRSDXRPVSVXKRRDSERFLLLDGGNS 

I I I 1 II I I I I I I 1 I I I I I II I I I I I I I I 
orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGE I SLRSDDRPVS VPKRRDSERFLLLDGGNS 

40 290 300 310 320 330 340 

40 50 60 70 80 90 

orf 61 . pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 
M I I I M I I I I! I I I I I I I I I I M t I i M II II : I II I I I I I M I I I I I I I I I I II I I I I 
45 o r f 6 1 a RLKWAWVENGT FATVGS APYRDLS PLG AEWAEKVDGNVRI VGC AVCGE FKKAQVQEQLAR 

350 360 370 380 390 400 

100 110 120 130 140 150 

orf 61 . pep KTEWLPSSAOAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACWVSCGTAVTVDALT DD 
50 ^ I I I I I I I II I I I I I I I II I I I 1 M I I M I I I I I ! M I I I I I I I I I I I 1 I M I I 1 M I M 

orf 61a KTF.WLPSSAOALGIRNHYRHPSEHGSDRWFI^ALGSRRFSRN ACVWSCGTAVTVDALT DD 
410 420 430 440 450 460 

160 170 180 189 

55 orf 61. pep GH YLGXGT IMPG FHLMKE S LAVRT ANLNRHAGKRY P FPT 

| | I I I I I II I I I I II I I I I I I I I II M I I I I 1 I I I M I 
orf 61a GHYLG-GTIMPGFHIJ^KESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 
470 480 490 500 510 520 

60 orf 61a HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
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530 540 550 . 560 570 580 

The complete length ORF6 la nucleotide sequence <SEQ ID 235> is: 

1 ATGACGGTTT TGAAGCCTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGTG TGACCCACCT 

3 51 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 
4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGCC GGCGCGCCTT 
501 GTCGCGTTTG GGTTTGAAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 
551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 
601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 
651 GGAAGTGGAA AACGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 
701 GGCGGGGAAA TGCCGATGCC GCCGTGTTGC TGGAAACGCT GTTGGCGGAA 

7 51 CTTGATGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 
801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

8 51 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 
901 CAAGGCGTTC TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 
951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGTGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

12 51 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 
1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

15 51 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT T T G AAAG AAA 
1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 
1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
17 01 GCGCGTGGCG GACAACCTCG TCATTCACGG GCTGCTGAAC CTGATTGCCG 
17 51 CCGAAGGCGG GGAATCGGAA CATACTTAA 

\ 

This encodes a protein having amino acid sequence <SEQ ID 236>: 



1 KTVLKPSHWR 

51 LLRQHDGYWR 

101 ARIAPDKAHK 

151 ELGSLSPVAA 

2 01 GKTVAWGIG 

2 51 LDAVLLQYAR 
301 QGVLHLETAE 

3 51 KWAWVENGTF 

4 01 QVQEOLARKI 
451 CVWSCGTAV 
501 
551 



VLAELADGLP 
LVRPLAVFDA 
TICVTHLQSK 
VACRRALSRL 
INFVLPKEVE 
DGFAPFVAEY 
GKQTWSGEI 
ATVGSAPYRD 
EWLPSSAQAL 
TVDALTDDGK 



RYPFPTTTGN 
AKVAEALPPA 



AVASGMMDAV 
FLAENTVRVA 



QHVSQLARMA 
EGLRELGERS 
GRGRQGRKWS 
GLKTQIKWPN 
NAASVQSLFQ 
QAANRDHGKA 
SLRSDDRPVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGSVMMMHGR 
DNLVIHGLLN 



DMKPQQLNGF 
GFQTALKHEC 
HRLGECLMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRRDSERF 
KVDGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKTGAGKP 
LIAAEGGESE 



WQQMPAKIRG 
ASSNDEILEL 
FGWVFDRPQY 
GILIETVRTG 
AVLLETLLAE 
FEGTVKGVDG 
LLLDGGNSRL 
CAVCGE FKKA 
LGSRRFSRNA 
TANLNRHAGK 
VDVIITGGGA 
HT* 



ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 



10 20 30 40 50 60 

or f 61 a pep MTVLKPSHWRVI^£LADGLPQHVSQLAR^4ADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
| | | I I ! M | ! 1 I | M ! I ! I I I I I I I I I I I M I I I I M I I I I I I M II I M I I 1 I I I 1 I I 
or f 61-1 MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

• 10 20 30 40 50 60 



70 80 90 100 110 120 

orf 61a. peD LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
I i I M ! M I 1 I I I I M I ! I I I I I I I II I i I I M I I I 1 I I I M I M t I i I M I i I M I M I 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILE1ARIAPDKAHKTICVTHLQSK 

70 80 - 90 ' 100 110 120 

130 140 150 160 -170 180 
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15 



20 



25 



30 



35 



40 



45 



orf 61a . pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
1 1 M i II I I f I t I M M I I t I I 1 I ! I i I I 1 I I I I I I 1 I I I M t ! I I I N I I I : I i I I I I 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 61a pep DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
I I I I I I ] I M I I I 1 I I I I I M M I I I I I I I II I 1 I i I I I M I I I I I I M I I I i I I I I I I I 
orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
| M | 1 | | I : | | I I I I I I II I I II I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I M I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I I I I I j I I I I I I I I I I i I II I I I I I I I I II I I II II I I I I I I I I I I I I I I I II I I I I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I | M | I I M I I I I II I II I I I : I I I I I I I I I I I 1 N I I I I I I I I I I I II I M I 1 I I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 61a .peo GIRNKYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
I I I M M I I II II I I I I I M M I I I I I I I I I II I I I I 1 I I I I I I 1 I I I I I M I II I 1 I I I 
orf 61 -I GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 61a . pep HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 
I | | I I I I M I I II I M M I I II I I I M I I I I II I I I I I I I I I I I I I I I I I II I M I I I I I 
orf 61-1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVl^lMHGRLKEKTGAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

orf 61a . oep VDVI ITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

I I I I I ! I I I I I I I I I I I I I I I I I I I I I I M I I I II : I I I I : I I I I I I II 
orf 61-1 VDVI ITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 

550 . 560 570 580 590 



50 



55 



60 



65 



Homology with a predicted ORF from N.gonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from N. 
gonorrhoeae: 

30 



orf 61 . pep 
orf 61ng 
orf 61 . pep 
orf 61ng 
orf 61 .pep 
orf 61ng 
orf 61 .pep 
orf 61ng 



EISLRSDXRPVSVXKRRDSERFLLLDGGNS 
I I I I I I I IN II I I I It I II : I I I I 
TVCEGTVKGVDGRGVLHLETAEGEQTVVSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 

RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 
I I 11 I I II I I I I I I II I M II I I I I M I I I I I I I I I I I I I I I I I I I I I II I I I : I I I i I 
RLKWAWVENGT FATVGSAPYRDLS PLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLAR 271 

KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 
I I I I I I I II I I I I I I I I \ 11 I I I I I I II I I I I I I I I I f I I I I I II II I I I I I I I I I I I I 
KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

GHYLGXGT IMPG FHLMKE S LAVRT ANLNRHAGKR YP FPT 18 9 

1 I I 1 I I I II M I M I I II I I II It M M I I I I I I I I I 

GHYLG-GT IMPG FHLMKE S LAVRT AN LNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 390 
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An ORF61ng nucleotide sequence <SEQ ID 23 7> was predicted to encode a protein having amino 
acid sequence <SEQ ID 23 8>: 

1 MFS FGW A F D R POYEL GSLSP VAALAC RRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAVV GIGINFVLPK EVENAASVQS LFQTASRRGN 

5 101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTVVS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACVWSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

10 351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGSIMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

15 51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

20 301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGChGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

25 551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

7 51 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

30 801 AAATG AG TAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGccrGCggc ccgacaacaG GTCGGt tree grgecgaage 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

35 1051 AAGTGGGCGT GagtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 ataCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

1201 CAAGTGAAGG AACAGCTCGC CCG AAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

40 1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

45 1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

17 51 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

50 This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

55 201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

401 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

60 4 51 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 

ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-176- 



PCT/IB98/01665 



10 



15 



20 



25 



30 



35 



40 



orf 61ng-l .pep 
orf61-l 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l . pep 
orf61-l 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l . pep 
orf61-l 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l .pep 
orf 61-1 
orf 61ng- 
orf 61-1 
orf 61ng- 
orf 61-1 



3-1 . oeo 



-l.pep 



orf 61ng-l . pep 



orf 61-1 



MTVLKPSHWRV1AELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

Mill | | M I I I I I I 1 I I I II I I I I M I I I I I I I I M M I I I M I II I If I II I I I I I 
MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

LVRPLAVFDAEGLRDLGERSG FQTALKHECAS SN DE I LE LARI APDKAHKT I CVTHLQSK 120 

M | | | | | | I M I II : II I II I I I I ! I I I I M 11 II I I I I I I I I M I I I I II I I U I I I I I 
LVRPLAV FDAEG LRE LGE RSGFQTALKHE CAS SNDE I LE LARI APDKAHKT I CVTHLQSK 120 

GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

| | | I 1 | I If I I M I I I I II I I I I : I I I I I I I I 11 11 I I I I : I I I I I I : I II :: I I I 1 I I 
GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 

DLVVGRDKLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 2 4 0 

| | | M | M I I I I I I M I I : I I I I M t I I ! I I I I I I I 11 I I I I I I I II I I I I I I I I M I 1 I 
DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 24 0 

AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

|||Mlll:|| Ml I I 1 : : I I I I I : I I : : I I I I I I I I I I I II I I II I M I 1 I I I i I 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

RGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNSRLKWAWVENGTF 360 

: I I I I I I I I I I : I I I I II I II I I 1:1 I I I I M I I I I I II I : I M I I I I I 1 II I I I I I 
QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 3 60 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 420 
M I M M I 1 I I I I I I I I I I I I I I I I I I i I I I I 1 I I I I I I 1 I : I I I II I I I I I J 1 I I I I I 
ATVGSAPYRDLSPLGAEWAZKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 4 20 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 4 80 
M II I I I I I II I I I I I M I I I I I I I I I I I I I I Ml I I I II I I! I 1 I I I I I I I I I I I I I M 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 4 80 

HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 54 0 
M I i 1 M I II I II I II I I I 1 I I I M I I I I I M I I I M I I 1 I I : M I I I I I I M : I I I M 
HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 54 0 

VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 5 93 
I I II I I I II I I I I M I I I I I I 1 I I I I I I I I I I I II : I I M : I I I I 1 I I I 
VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVI YGLLNMIAAEGREYEHIX 593 



Based on this analysis, including the homology with the baf protein of B.pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



45 Example 29 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 24 1>: 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGaAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGaAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGC . . 



60 This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 



BNSDOCID: <WO 992457BA2_I_> 



WO 99/24578 



-177- 
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1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC. . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ATGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGCTG 
CCTTGGGCGT 
TCGCATCAAA 



25 This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLE PWGVLL 

251 AVLILGEHLS PVSALGVFW IAATLVAGRL SHQK* 



30 



35 



40 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical transmembrane protein HI0976 of H. influenzae (accession number 057 147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

Orf 62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS I K Y +DP L+V VR R KI + K 

HI097 6 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYT SA+ SA ++GLEPLL+VFVGHFFF K + 

HI0976 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of N. 



45 meningitidis: 
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55 



orf 62 .pep 
orf 62a 

orf 62 .pep 
orf 62a 

orf 62 .pep 



10 ' 20 30 40 50 60 

MFYQILALI IWS SS FIAA KYVYGGI D PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 
I I I I I I II I I M I II I I 1 M I I I I II 11 II I I I I I M I I I I II I M 1 I II I II I I 1 I I I I 
MFYQILALI IWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

L LIVSFVNYVLTLLLQFV GLKYTSA ASASVIVGLEPLLMVFV GHFFFNDKARAYHWICGA 

) I I M I I I I I I t I I I I I M I I I I II I M I M I I M I II I I I I I I M II I I II M I I I I I I 
L LIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYHW ICGA 
70 '. 80 90 100 110 120 

130 140 150' 160 170 180 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 



BNSDOC1D: <WO. 



992 4-578 A2_!_> 



WO 99/24578 



PCT/IB98/01665 



-178- 



10 



| Ml I I II I M I I It I.I I I I M I I 1 I I I I I I I I I I II I I I I I M I II I I I M I 1 I I I ! I I 
orf 62a AAFAGVALLMAGG AEEGGEVGW FGCLLVLLAGAGFCAAM RPTQRLIARIGAPAFTS VSIA 
130 . 140 150 160 170 180 

190 200 210 

orf62 pep AASLMCLPFSLAL AQSYTVDWSVGMVLSLLYLGLGC 
I M I I I I M I I I I I I I I I I I I I I I I I M M I I I : I I 
or f 62a AASI^CLPFSLAL AQSYTVDWSVGMVLSLLYLGVGCSWYAYWLVmKGMSRVPANVSGLLI 

190 200 210 220 230 240 

o r f 62 a SLEPWGVLLAVLI LGEHLS PVSVLGVFWI AATLVAGRLSHQKX 

250 260 270 280 

The complete length ORF62a nucleotide sequence <SEQ ED 245> is: 
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25 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGTTTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
AATAA 



AAATCCTTGC 
GTCTATGGCG 
TGCTGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GAATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TACTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 
GGAACAAGGG 
TCGCTCGAAC 
ACACCTGTCG 
CCTTGGTTGC 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCACT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGCG 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCAG 
GTTCCTGCCA 
CGTGCTGCTG 
TCTTGGGCGT 
TCGCATCAAA 



35 



This encodes a protein having amino acid sequence <SEQ ID 246>: 

i MFYQILALI I WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKI PREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW I CGA AAFAGVALLM AGG AEEGGEV GW FGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 
251 AVLI LGEHLS P V5VLGVFW IAATLVAG RL SHQK* 

ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



40 



45 
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55 



60 



orf 62a . pep 
orf 62-1 
orf 62a . pep 
orf62-l 
orf 62a . pep 
orf 62-1 
orf 62a . pep 
orf 62-1 
orf 62a . pep 
orf 62-1 



MFYQILALI IWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I M I | | | I | I J I ] I I II II I I I M.I I I I I I M I I I I I I It M It I I I I I I I I I II I I M I 
MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 
I I M | I II I M I M I I I I M II M II I I I I I I M I I II I I I I I I I I I I I I I I I I I I I II I 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 
I I I I I I I I I I I I I I I M I I I I I M I I I I I I I 1 II i II I I I I I I I I I I I I i I I I II I I I I I 
AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 2 40 

| j | | | | | 1 | | | I I II I I I I I I I I I I M M M I I : I I : I I I 1 I M I I M M I I II I II M I 
AASLMCLPFSLALAQS YTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 2 4 0 

SLEPWGVLLAVLILGEHLS PVSVLGVFWI AATLVAGRLSHQKX 285 
| | | | | | I | | II I I M I I II I I I I : I I M I I I I I I I M II I I II I I 
SLEPWGVLLAVLILGEHLS PVSALGVFWI AATLVAGRLSHQKX 285 



Homoloev with a predicted ORF from N. gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-179- 



PCT/IB98/01665 



MFYQIUU.IIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I | | | | | | | | | I : I I I I I I I I! I I I M I M I M M I I I I I I I ! I M I I I I I I I I I II II I I 
MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

I | i M | | | | M I M M I I M I I I I I I I I I II I I I I I I I I I I I I M I I M I I I I I M II I I 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLIAGAGFCAAMRPTQRLI ARIGAPAFTSVS IA 180 
M | | | | M I i I I II I I I I I I I I i I I I I I I I I I I I I I I I I 1 I I I I I I I M I I I I I I I I I I I 
AA FAG V AL LMAGG A£ E GG E VGW FG C LL VL LAG AG FC AAMR PT QR L I AR I G A P A FT S V S I A 180 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 216 
I I I I 11 II I I II I I I I M I I M I M I I I I M I I I I I 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 24 0 

The complete length ORF62ng nucleotide sequence <SEQ ED 247> is: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGGGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GG CAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 CCGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGTTG TATTTGGGTT TGGGGTGGGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGCGTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGTTG 

751 GCGGTTTTGA TTTTGGGCGA ACATTTATCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CTTTCGCCGC CGGCCGGCTG TCGCGCAGGG 

851 ACGCGCAAAA CGGCAATGCC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 248>: 

1 MFYOILALII WGSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTSA ASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AA FAG V AL LM AGG AEEGGEV GW FGCLLVLL 

151 AGAG FC AAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI 5LEPWGVLL 

251 AVLILGEHLS P VSALGVFW IAAT FAAG RL SRRDAQNGNA V- 



ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 

10 20 30 40 50 60 

MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

I | I | | | I I I I I : I I I It II M II II I I I I I I I I II I M I II I I I II I I II I I I I I I I I I I 
MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
10 20 30 40 50 60 

70 80 90 100 110 120 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
t M I II I M t I I I I t I I I I I I I I I I I I I I I I I M I I I I I I I I I 11 I I I I I I I t I I I I I I I 
LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 
70 80 90 100 110 120 

130 140 150 160 170 180 

AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRL I ARIGAPAFTSVS I A 
I I I I i I t I I I 1 I M I 1 M I I I II It i I I I I I I I I I I I I I I I I I I I I I I I I I i II I I I I M 
AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMR PTQRLIARIG APAFTSVSIA 
130 140 150 160 170 180 

190 200 210 220 230 240 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
| I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I : I I I I I 
AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 
190 200 210 220 230 240 



orf62.pep 
orf 62ng 
orf62.pep 
orf 62ng 
orf 62. pep 
orf 62ng 
orf 62 .pep 
orf 62ng 



orf 62ng.pep 
orf 62-1 

orf 62ng .pep 
orf62-l 

orf 62ng . pep 
orf62-l 

orf 62ng . pep 
orf62-l 
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250 260 270 280 290 

orf 62ng . pep S LE PVVGVLLAVLI LGEHLS PVSALGVFWIAATFAAGRLSRRDAQNGNAVX 
I I I I M I I I II I I I I I I I I I I I 1 I I I M I I I I I I :: I I I M : : 
5 orf 62-1 SLE PVVGVLLAVLI LGEHLS PVSALGVFWIAATLVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical Kinfluenzae protein: 

sp|Q57147 | Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi 1 1074589 I pir || B641 63 
hypothetical protein HI097 6 - Haemophilus influenzae (strain Rd KW20) 
10 >gi | 1574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 

Score - 106 bits (262), Expect = 2e-22 
Identities = 56/114 (49%), Positives « 68/114 (59%) 

Query: 1 MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 
15 M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct : 1 MLYQIIALLIWSSSLIVGKLTYSMMDPVLVVQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 
L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 

20 Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Based on this analysis, including the homology with the transmembrane protein of HAnfluenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
25 be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 30 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

30 101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT AT G T CAT AT T GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

35 3 51 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

401 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAAT CGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

40 601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GG AAAG CAT A GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

7 51 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

8 01 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 
45 851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

50 1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC . . 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

55 101 GTINSWFGND T HEALERS LN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 
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301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 
351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT AT GT CAT ATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

4 01 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

451 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

7 51 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC C GG AC GAT GC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

14 01 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

14 51 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

17 51 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

2001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AM LLLVLSAV 

51 LARYVILLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFLALVM AL YFARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGWVFDE QGCLKTFNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

4 51 LGKATVLPED NGNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD TIVKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSETGQD GRIVLTVCDN GKGFGREMLH 

651 MAFEPYVTDK PAGTGLGLPV VKKIIEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 64 . pep MRRFLPIAAICAXXLXXGLTAATGSTSSIA DYFWWIVAFSAM LLLVLSAVLARYVILLL K 
| | | | I | | | I I j I i I I I j I I I M I I I I I I I M I I II II I I I II ! I I I ! I I I I I M [ I 
o r f 6 4 a MRRFLPIAAICAWLLYGLTAATGSTSSLA DY FW W I VAFSAM LLLVLSAVLARYVILLL K 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 64 pep DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFLFG FPAQFINGTINSWFGNDTHEALERSLN 
| | | | | | | | | i I . I I I I I I I I I II I I I I I I I I I M M I I \ I I I I I I I I I I I I 
orf 64 a DRRDGVFGSQ I AKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDT HEALERS LN 

70 80 90 100 110 



130 140 150 160 170 180 

orf 64 . pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 
[ | | | | | I I I I I I I I I I I : I I I I I I I I I I I I I I I I I II I I I M II I I I I I i I I I I I I I 
orf 64a LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
120 130 140 150 160 170 ^ 

190 200 210 220 230 240 

orf 64 . pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 
I I I I I I I I I I M I I I I I II II I : I I I I I I ! I I I I i I I I I I I I I II I I I I I II II I I 
orf 64 a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 
180 190 200 210 220 230 

250 260 270 280 290 300 

orf 64 . pep VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 
I I I I I I I I M I I M I II II I I I I 11 I I I I I I I M I I I I II I I II I I I I I I II I I I I 

orf 64 a VPKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLAT LLIASLLSIFIALVMALY FARRFV 
240 250 260 270 280 " "* 290 

310 320 330 340 350 360 

orf 64 . peo EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
I I M I I I M I II I I I I I M t I I I II II I I I I I I I I M II I I I I I M I : I I I I I If 1 I I I 
orf 64 a EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 

370 380 390 

orf 64 . pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 64a ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 . 390 400 410 



orf 64a 



LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 460 470 



The complete length ORF64a nucleotide sequence <SEQ ED 253> is: 



1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

2 51 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 
301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

3 51 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 
401 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

4 51 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 
501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 
551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 
601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 
651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 
7 01 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 
7 51 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 
801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 
851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 
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901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



CCCGTCCTAT 
CAGCCAGACG 
AGTTGTTCAA 
GAGCGCAACC 
GTTGGAGGGG 
TGAAAACCTT 
CCCCTGTGGG 
GTCCCTGCTT 
ACAAACCGGT 
CTGGGCAAGG 
GGTGATTGAC 
GGGGCGAAGT 
CCCATCCAGC 
GGACGAGCAN 
AACAAGTGGC 
CGTTCCCCTT 
CGATGTGTTG 
TTGCCGGCGA 
GTGCTGCACA 
TGTGCCCGAA 
TCCTGACAGT 
AATGCCTTCG 
ACTGCCCGTG 
TGAGCAATCA 
ACGGTAGAAA 



CGCTTGCCGA 
CGCCCCGTGT 
CCACATGACC 
GCCGGCGCGA 
CTGACCACGG 
CAACAAAGCG 
GCAGCAGCCG 
GCCGAAGTGT 
CCATGTGAAA 
CAACCGTCCT 
GACATCACCG 
GGCAAAACGG 
TTTCTGCCGA 
GACGCGCAAA 
GGCATTAAAA 
CGNCTCAATT 
GCATTGTACG 
ACCGCTGATG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGC 
CTTATGCGTA 



GGGGGCGAAG 
TGCGCAACGA 
GAGCAGCTTT 
GGAAGCCGCC 
GCGTGGTGGT 
GCGGAACAGA 
GCACGGTTGG 
TTGCCGCCAT 
TATGCCGCGC 
GCCCGAAGAC 
TTTTGATACA 
CTGGCACACG 
ACGGCTGGCG 
TCCTGACACG 
GAAATGGTCG 
GGAAAATCAG 
AAGCTGGTCC 
ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAGC 
GGCAAGGGGT 
AAC GG AC AAA 
TCATTGAAGA 
GGCGCGTNTG 
G 



GCGGTGGCGC 
CGAGTTCGGA 
CCATCGCCAA 
AG AC ATT AT C 
GTTTGACGAA 
TTTTGGGGAT 
CACGGCGTTT 
CGGCGCGGCG 
CGGACGATGC 
AACNGCAACG 
CGCGCAAAAA 
AAATCCGCAA 
TGGAAATTGG 
TTCGACCGAC 
AGGCATTCCG 
GATTTGAACG 
GTGCCGGTTT 
ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAGGGA 
CCGGCTGGAA 
ACACGGCGGC 
T C AG AAT CAT 



AAGGCGATTT 
CGCTTGACCA 
AGAAGCAGAC 
TCGAATGCGT 
CAAGGCTGTC 
GCCGCTTACC 
CGGCGCAGCA 
GCAGGTACGG 
CAAAATCCTG 
GCGTGGTAAT 
GAAGCCGCGT 
TCCGCTCACG 
GCGGGAAGCT 
ACC AT CAT C A 
CAATTACNCG 
CCTTAATCGG 
GCGGCGGAAC 
CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGATTGNG 
CNCATCAGCC 
CTTGCCAAAA 



This encodes a protein having amino acid sequence <SEQ ID 254>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



LARYVILLL K 
TINSWFGNDT 
XDMGRVLEHY 
QQAGS VRDXE 
IEKARAXXXX 
PVLSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RSPSXQLENQ 
VLHNI FKNAA 
NAFEPYVTDK 
TVETYA* 



DRRDGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAXG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NXNGWMVID 
WKLGGKLDEX 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLXLPV 



IAKRLSGMFT 



DYFWWIVAFS 
LVAVLPGVFL 



AMLLLVLSAV 
FGVSAQFING 



SKSALNLAAD 
YNAASGKIEK 
WLSAXTHNGR 
FFLATLLIAS 



N ALGNAI PVQ 
SINPHKLDQP 
DYALFFRQPV 
LLS I FLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEV FAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSEAGQD 
VKKIIEEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIIKQVAALK 
AAELAGEPLM 
GRIVLTVCDN 
XISLSNQDAG 



IDXIGAASLP 
FPGKARWEKI 
PKGVAEDAVL 
ALY FARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYX 
MAADTTAMRQ 
GKGFGREMLH 
GAXVRIILPK 



ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 



10 20 30 40 50 60 

orf 64a pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 
| | i | t M | I I M M I I I I I I I I I I I I M M i I I t I I t I I I 1 I ! M I I 1 I II I I I I M i I I 
orf 64-1 MRRFLPIAAI CAWLLYGLT AATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 ' 120 

orf 64a pep DRRDGVFGSQIAKRLSGMSTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
' * I | | | t | I I I I i M M It I I I I M I M I I M M I I M I I I I I I I I I I I I I I I I I I I I I M I 
orf 64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 64a pep SKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
I I I I t I I I 11 1 II I II : I I M I I M I I I i I I I I I I I M I I M M I I I I I I I I I I II I I 
orf 64-1 SKS ALN LAADNALGN AV PVQ I DL I GAAS L PG DMGR VLEH YAG SG FAQLALYNAASGKI EK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 64a . pep SINPHKLDQP FPGKARWEKI QQAGS VRDXE S I GGVLYAXGWLSAXTHNGRDYALFFRQPV 
I | | | | | | | | | I II I M I I I II : I I I I II MINIMI Mill M M M I M M I II I 
orf 64-1 S INPHKLDQPFPGKARWEKIQRAGSVRDLES IGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 
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orf 64a . pep PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
I I II I I II I II It I I I I M I I M I I I I I I I I I I I I I I 1 I I I M I I I I I I I I I I I I I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64a . pep PVLSLAEGAKAVAQGDFSQTRPVLRNDE FGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I ! | | | | | I | I I M I I I I II I I I I II I I I I II ! I I I I I I I I I I I I M I II I I M I I I I I I I 
or f 64 - 1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 64a . pep RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
I | I I | I I I I I I I I I I I II I I I f I I I I I I I II I M I I I I II I I II I I I I I I I I I I I I I I I I 
orf64-l RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 64a . pep AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQK 
I ! I M I I II I I I I I I I M I M I I I I I I I I I I II I I I I I I I I I I I I I I I II I M I I I I I I 
orf 64-1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 64a . pep EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 
I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 11 I : I I II I I I 
orf 64-1 EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 64a . pep EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAG PCRFAAELAGEPLMMAADTTAMRQ 
I I I I I I I I I MM : II M I M M I I I I I M II M I I M I I II It I II M M I II II I 
orf 64-1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
I I I II I I M M M I i I M II II I I M : I II II I II M II I II I M M II I I I II I I II M 
or f 6 4 - 1 VLHN I FKN AAE AAEEADV PE VRVKSETGQDGRI VLT VCDN GKG FGREMLHN AFE P YVT DK 

610 620 630 640 650 660 

670 680 690 700 

orf 64a. pep PAGTGLXLPWKKIIEEHGGXISLSNQDAGGAXVRIILPKTVETYAX 
I I | M I M M I 11 I I II II I II M II It M I I II I I M I : I II I 
orf64-l PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI ILPKTVKTYAX 

670 680 690 700 



50 



Homology with a predicted ORF from N. gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 
gonorrhoeae: 



55 



60 



65 



orf 64 . pep 
orf 64ng 
orf 64 . pep 
orf 64ng 
orf 64 . pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 60 
M I I M II I M I I il I I M M I I I I I I M M I I : 1 i I M II II I M I II II i II I I 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 60 

DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEAIiERSLN 120 
111:11111 M II I II I IIIMMI: II I II I I I II I 1 M I I I II I II I I I 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 
M II I I M I M I I : : II M II II I II : M I Mil I II I I I I M I M I I I I I M I II I 

LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 17 9 

KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 2 40 
M I I M : : 1 II : I I : I I : I I :: I I I I : I I II I I II 1 I I 1 I I I I I 1 I . I I I I II M I I I 

KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 
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orf 64 .pep 
orf 64ng 
orf 64 . pep 
orf 64ng 
orf 64 . pep 
orf 64ng 



VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLS I FLALVMALYFARRFV 

: | :: | | : | | | I I I ! M I I I i I I I I I M I I I I I I I : M M I M I I I II I I I II I I I I I M I 
IPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRF\' 

EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
| | : | | | | M | | | | M I I I I I I M II I I I I I I I M I M I I I M I I I M : I I II I I I I I I I 
EPILSLAEGAKAVAQGDFSQTRPVLRNDE FGRLTKLFNHMTEQLSIAKEADERNRRREEA 

ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 394 
I II I II I I I : I I I I II I I : I : 1 

ARHYLECVLDGLTTGWVSYPLSCCRTAVFSTCHSSPLSYF 4 00 



300 



299 



360 



359 



An ORF64ng nucleotide sequence <SEQ ED 255> was predicted to encode a protein having amino 



acid sequence <SEQ ED 256>: 



15 



20 



MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



51 LARYVILLL K 

101 TINSWFGNDT 

151 GNMGSVLEHY 

201 QQTGSVRSLE 

251 IEKARAKYAE 

301 PILSLAEGAK 

351 ERNRRREEAA 



DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 



IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLSIFLALVM 



RPVLRNDEFG 
LTTGVWSYP 



RLTKLFNHMT 
LSCCRTAVFS 



AM LLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
TCHSSPLSYF* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 
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30 



35 



40 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



ATGCGCCGCT 

CGGATTGACG 

GGTGGATAGT 

TTGGCACGTT 

CGGTTCGCAG 

TACTGCCCGG 

ACGATTAATT 

CCTTAATTTG 

GCAACGCCGT 

GGCAATATGG 

GCTTGCCCTG 

CGCACCAATT 

CAGCAGACCG 

CGCGCAGGGA 

TGTTCTTCCG 

ATTGAAAAGG 

TTTGCAGACC 

TTTTTCTTGC 

CCCATTCTGT 

CAGCCAGACG 

AGCTGTTCAA 

GAACGCAACC 

GTTGGATGGG 

TGAAAACCTT 

CCCCTGTGGG 

GTCCCTGCTT 

ACAAACCGGT 

CTGGGCAAGG 

GGTGATTGAC 

GGGGTGAAGT 

CCCATCCAGC 

GGACGATCAG 

AACAGgtggc 

CGCGCCCCTT 

CGATGTTTTG 

TTGCCGGCGA 

GTGCTGCACA 

TATGCCCGAA 

TCCTGACGGT 

AATGCTTTCG 

TCTGCCTGTA 

TGAGCAATCA 

ACGGTAGAAA 



TCCTACCGAT 

GCGGCGACCG 

CTCGTTCAGC 

AT GT CAT ATT 

ATTGCCAAAC 

CTTGTTCCTG 

CGTGGTTCGG 

AGCAAGTCCG 

TCCCGTACAG 

GCAGTGTGCT 

TACAATGCCG 

CGACCAGCCG 

GTTCGGTTCG 

TGGTTGTCGG 

CCAGCCGATT 

CGCGGGCGAA 

TTTTTTCTGG 

GCTGGTAATG 

CGCTTGCCGA 

CGCCCCGTAT 

CC AT ATGACC 

GCCGGCGCGA 

TTGACTACCG 

CAACAAGGCG 

GCAGCAGCCG 

GCCGAAGTGT 

C C AG GT G G AA 

CGACGGTATT 

GACATCACCG 

GGCGAAGCGG 

TTTCCGCCGA 

GACGCGCAAA 

gGCGTTAAAA 

CGCTCAAACT 

GCCCTGTACG 

ACCGCTGATG 

ATATTTTCAA 

GTCAGGGTAA 

TTGCGACAAC 

AGCCGTATGT 

GTGAAAAAAA 

GGATGCGGGT 

CTTATGCGTA 



C G C AG C CAT A 

GCAGCACCAG 

GCAATGCTGC 

GCTGTTGAAA 

GCCTTTCCGG 

TTCGGCATTT 

CAACGACACC 

CACTGGATTT 

ATAGACCTCA 

GGAACACTAC 

CAAGCGGGAA 

CTTCCCGACA 

GAGTTTGGAA 

CAGGTACGCA 

CCCGAAAATG 

ATATGCCGAA 

TAACCCTGCT 

GCACTGTATT 

GGGCGCAAAG 

TGCGCAACGA 

GAGCAGCTTT 

GGAAGCCGCC 

GTGTGGTGGT 

GCGGAACAGA 

GCACGGTTGG 

TtgccgccAT 

TATGCCGCGC 

GCCCGAAGAC 

TGCTGATACG 

CTGGCACACG 

ACGGCTGGCG 

TCCTGACGCG 

GAAATGGTCG 

GGAAAAT C AG 

AAGCCGGCCC 

ATGGCGGCGG 

AAATGCCGCC 

AATCGGAAAC 

GGCAAGGGAT 

GACGGATAAG 

TCATTGGAGA 

GGGGCGTGTG 

G 



TGCGCCGTCG 

TTCGCTGGCG 

TGCTGGTGTT 

GACAGGCGCA 

GATGTTCACG 

CCGCGCAGTT 

CACGAAGCCC 

GGCGGCAGAC 

TCGGCACCGC 

GCCGGCAGCG 

AATCGAAAAA 

AAGAACATTG 

AGCATAGGCG 

CAACGGGCGC 

TGGCACAGGA 

TTGAGTTACA 

GATTGCCTCG 

TTGCCCGCCG 

GCGGTGGCGC 

CGAGTTCGGA 

CCATCGCCAA 

CGTCACTACC 

GTTTGACGAA 

TTTTGGGGAT 

CACGGCGTTT 

CGGTGCGGCG 

CGGACGATGC 

AACGGCAACG 

CGCGCAAAAA 

AAATCCGCAA 

TGGAAATTGG 

TtcgACCGAC 

AGGCATTCCG 

GATTTGAACG 

GTGCCGGTTT 

ATACGACCGC 

GAAGCGGCGG 

GGGGCAGGAC 

TCGGCAAGGA 

CCGGCGGGAA 

ACACGGCGGC 

TCAGAATCAT 



TCCTGCTGTA 

GATTATTTCT 

GTCCGCCGTT 

ACGGCGTGTT 

CTGGTCGCCG 

TATCAACGGC 

TCGAACGCAG 

AATGCCGTCA 

CTCCCTGTCG 

GTTTTGCCCA 

AGCATCAATC 

GGAACAGATT 

GCGTATTGTA 

GATTACGCGC 

TGCCGTTCTG 

GCAAAAAAGG 

CTGCTGTCGA 

TTTCGTCGAA 

AGGGTGATTT 

CGTTTGACCA 

AGAAGCAGAC 

TCGAGTGCGT 

AAAGGCCGTT 

GCCGCTCGCC 

CGGCGCAGCA 

GCAGGTACGG 

CAAAATCCTG 

GCGTGGTGAT 

GAAGCCGCGT 

TCCGCTCACG 

GCGGGAAGCT 

AC CAT CAT C A 

CAATTACGCG 

CCTTAATCGG 

GAGGCGGAAC 

CATGCGGCAG 

AAGAAGCCGA 

GGACGGATTG 

AATGCTGCAC 

CGGGACTGGG 

CGCATCAGCC 

CTTGCCAAAA 
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This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 

1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS AMLLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGWVFDE KGRLKTFNKA AEQILGMPLA 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVQVE YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIRAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDDQ DAQILTRSTD TIIKQVAALK EMVEAFRNYA 

551 RAPSLKLENQ DLNALIGDVL ALYEAGPCRF EAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADMPE VRVKSETGQD GRIVLTVCDN GKGFGKEMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIGEHGG RISLSNQDAG GACVRIILPK 

701 TVETYA* 

ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 



10 20 30 40 50 60 

orf 64ng-l '. pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 
| I | I M | I M I I I I i I I I I I t I I I I II I I I I I M I M : I M I I I I I I M I I I II I I I I I I 
or f 64-1 MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 64nc-l .pep DRRNGVFGSQ I AKRLSGMFTLVAVLPGLFLFG I SAQFINGTINSWFGNDTHEALERS LNL 
[ M : | I I I I I I I I I M I I I I 1 II I I I I : I I M : I I I I I I I I M I M I I I I M I I I II I I I 
0^64-1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 64ng-l . pep SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 
I | I I I : I I I I I I : : I I II I I I 1 I II.: I I I I : 1 I I M I I I I I 1 I I I I M I I I I I I I 1 I I 
orf 64-" SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 64ng-l .pep SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 
I M | [ : : I I I : I I : I I : I I : : M I I : I I I I M I I I I I 1 I I I I I I I I II I I M I I I I I : 
o ^ f 6 4 - 1 S IN PHKLDQP FPGKARWEKIQRAG SVRDLE S I GGVLYAQGWLS AGTHNGRDYALFFRQP V 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 64ng-l.pep PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFVE 
| :: I I : I I I I I I I I I I I I I I M i I I II I I t I I I : I I I I i I I 1 I I I I I I I I I I I I I I I I I I 
orf 64-1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 . 280 290 300 

310 320 330 340 350 360 

orf64ng-l .pep PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
I : | | | 1 | I I I I 1 M I I I I I I I I I I I I I I I I I II I I I I I I II I M I I I I II I I I M I I M I 
orf 64-1 PVLSLAEGAKAVAQGDFSQTRPVLRNDE FGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 64ng-l . pep RHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 
M I I I I M : I I I I It I I M I : I I 1 I I I I I I I I I I II I I : I I I I I I I I I M I I I I I I I I I 
orf 64-1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 ' 440 450 460 470 480 

AEVFAAIGAAAGTDKPVQVEYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIRAQK 
I | M I M I M I I I I M I : I : I M I I I II I I I I I I I M I M II I I I I I I II I I I I I I : I I I 
AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 
430 440 . 450 460 470 480 

490 500 510 520 530 540 

orf 64ng-l .pep EAAWGEVAKRLAHE I RN PLT P I QL S AERLAWKLGGKLDDQDAQI LTRSTDT I IKQVAALK 
N | I I M I I I I I I I I I I I I I I I ill I I I I I I I I I I II I : I I I I I I I I I I I ! I : I I I II I I 



orf 64ng-l . pep 
orf 64-1 
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or-f64-l EAAWGE VAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 
utJ -^ x ,-on Mn S20 530 540 



orf 64ng-l .pep 
orf 64-1 



490 500 510 520 530 

550 560 570 580 590 600 

EMVEA^RNYARAPSLKI^NQDLNALIGDVIALYEAGPCRFEAEIAGEPI^MMAADTTAMRQ 

I | 1 I M | | | | | : I I ! I I I I I I 1 I I I ! I I I I I M M I I I I I I I I I I I I I : I I I I I I I I I 
EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 64na- 1 oep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 

9 -P P | | | | ] j | | | | | | i | | | | : | | }{ | | i | | | j | | | | | 1 || I M I I M I : 1 I II II Ml 

nrf o.i VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
° 610 620 630 640 650 660 



670 680 690 700 

orf64nq-i pep PAGTGLGLPWKKI I GEHGGRI SLSNQDAGGACVRI ILPKT VETYAX 
I I I I 1 I M I I I 1 M 1 I I I I I I I II M I I I I M I I I I M I I I : I I I I 
orf 64-1 PAGTGLGLPWKKI IEEHGGRI SLSNQDAGGACVRI I LPKTVKTYAX 

20 " 670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from A.caulinodans: 

SDIQ04850INTRY AZOCA NITROGEN REGULATION PROTEIN NTRY >gi I 77 47 9 | pir I I S18 624 ntrY 
protein - Azorhizobium caulinodans >gi|38737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 
25 Score = 218 bits (550) , Expect = 7e-56 efl/ ,„„ , Qtt . 

Identities = 195/720 (27%), Positives = 320/720 (44%), Gaps = 58/720 (8%) 

Oue-v 1 7 AA I CAWLLYGLTAATG STSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 . 

I -t-A* ++L GLT + + + R + + K R G 

30 Sbjct: 25 ISALAT FLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 

Query 67 FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+ ++ R + G-F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
Sbjcr : 91 AAARL H I R I VG L FA W S W PAI LVA WAS LTLDRG LDRWFSMRTQE I VAS S VSVAQT YVR 150 

Que-y LAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAG— SGFAQLALYNAASGKIEKSINP 184 

A N + + + DL S+ YGSFQ+ AA+++ 

Sbjct: 152 EHALNIRGDILAMSADLTRLKSV YEGDRSRFNQILTAQAALRNLPGAMLI 200 

40 Query: 185 HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 233 

-r D + ++ + I + V + +IG Q + N DY 

Sbjct: 201 RR-DLSWERAN-VNIGREFIVPANLAIGDATPDQPVIYLP — NDADYVAAWPLKDYDD 256 

Query: 234 --LFFRQPIPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTXXXXXXXXXXXXXVMA 291 
45 L+ + I V ++ A Y L + G+Q F + + 

Sbjct: 257 LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 316 

Query: 2 92 LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 350 
L F+ + V PI L A VA+G+ P+ R + + L + FN MT +L 

50 Sbjct: 317 LNFSKWLVAPIRRLMSAADHVAEGNLDVRVPIYRAEGDLASLAETFNKMTHELRSQREAI 376 

Query: 351 XXXXXXXXXXXHYLECVLDGLTTGWVFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 410 

+ r VL G-r GV+ D + R+ N++AE++LG L+ + RH 
Sbjct: 37 7 LTARDQIDSRRRFTEAVLSGVGAGVIGLDSQERITILNRSAERLLG — LSEVEALHRHLA 4 34 

Que-y 411 HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 4 67 

V LL E + VQ D + + V E + +G V+ 

Sbjct: 4 35 EWPETAGLLEEA EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 4 88 

60 Query: 4 66 V I D D I T VL I RAQKEAAWGE VAKRLAHE I RN PLT P I QLS AERLAWKLGGKLDDQDAQI LTR 527 

+ DDIT LI AQ+ +AW +VA+R+AHEI+NPLTPIQLSAERL KG + QD +1 + 
Sbjct: 4 89 TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 54 7 

Query: 528 STDTIIKQVAALKEMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGE 587 
65 ' TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 

Sbjct: 548 CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEWFDSEVP 607 

Query 588 PLMMAA-DTTAMRQVLHNIFKNXXXXXXXXDMPEVRVK SETGQDGRIVLTVCD 639 

PMA D +QLNIKN- P+VR + * ■ + G+D +V+ + D 

70 Sbjct: 608 PAMPARFDRRLVSQALTNILKNAAEAIEAVP-PDVRGQGRIRVSANRVGED LVIDIID 664 
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Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPVVKKIIGEHGGRISLSNQDAG-GACVRIIL 698 

NG G +E + EPYVT + GTGLGL +V KI + EHGG I L++ G GA +R+ L 
Sbjct: 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 

5 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

10 Example 31 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

15 151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

20 4 01 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

4 51 CACGCGTTGG ATACG. . . 

This corresponds to the amino acid sequence <SEQ ED 260; ORF66>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 

25 101 LSEFNTFVGR I ALAS FAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 

151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 26 1>: 

2 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 
51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

30 101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

35 351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

4 01 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

4 51 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

40 601 ATACTGAATC TGCTGACGAA AAAACTGACA ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

45 101 LSEFNTFVGR I ALAS FAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o221 of E. coli (accession number P37619) 
50 ORF66 and o221 protein show 67% aa identity in 155aa overlap: 
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MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 
M F+ Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 
MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 12C 
RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 



+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 

Homology with a predicted ORF from N, meningitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of N. 



orf 66 


1 


o221 


1 


orf 66 


61 


o221 


61 


orf66 


121 


o221 


121 



meningitidis: 



orf 66 . pep 
orf 66a 



10 20 30 40 50 60 

M YAFTAAQQQKALFRLVLFHI LI IAASNYLVQFPFQI FG IHTTWGAFS FP FI FLAT DLTV 
I | | J M I II I I I I I I I M I I i I I I I I I I M I I i I I I I M I I M I I II I I I I II I I I II 
MYAFTAAQQQKALFWLVLFHI LI IAASNYLVQFPFQI SGI HTTWGAFS FPFIFLATDLTV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf66 pep RI FGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNT FVGRI A LAS FAAYA 
I | I || M I || I I I I I I I I I I I I I I I I I I I I I I I M I I I I II I I M I M 1 I I I I I M M I I 
orf 66a R I FG S HLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNT FVGRI ALAS FAAYA 

70 80 90 100 110 120 

130 140 150 

orf 66. pep IGQILDI FV FNKLRRLKAWWIAPNAS TVIGHALDT 
: | I M M I II I I I I I I I t j I : II : i I I I I I : i I I I 
o r f 6 6a LGQILDIFV FNKLRRLKAWWVAPTAS TVIGNALDTLVFFAVAF YAS S DG FMAANWQG I AF 

130 140 150 160 170 180 

orf 66a VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 

190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGGCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

4 51 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSKLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I A LAS FAAYA LGQILDIFV F NKLRRLKAWW VAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 



10 • 20 30 . . 40 50 60 

orf 66a. pep MYAFTAAQQQKALFWLVLFHILI IAASNYLVQFPFQI SGIHTTWGAFSFPFI FLAT DLTV 
I II } I I I I I I I I II I I I I I I I I I I I I II II I M I I I I I I I I 11 M I II I I I I I I I I I ! 
orf 66-1 MYAFTAAQQQKALFRLVLFHILI IAASNYLVQFPFQI FGIHTTWGAFSFPFIFLATDLTV 
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70 80 90 100 110 120 

orf 66a . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
I M I I I I I I I 1 M I I I I I I i I I I t I I M I I I I I I I i I I I I I I M I i ! I I I i M > I I I I I I 
orf 66-1 RI FGSHLARRI I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNT FVGRI ALAS FAAYA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 66a. pep LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
: I I I I I I M I I 1 I I t 1 i I t I : I M I I I I S t I I I I I I I I I i I I I I I I I M I I M i I II I I I 
orf 66-1 IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 

130 140 150 160 170 180 

190 200 210 220 229 

orf 66a . oep VDYLFKLTVCGLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 
I I I I I I I I I I I I I I 1 I II II I II II II I I I I I II I t I I I I I I I I I I ! I 
orf 66-1 VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 

190 , 200 210 220 

Homology with a predicted ORF from N. gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 
gonorrhoeae: 



25 



30 



orf 66 .pep 
orf 66ng 
orf 66 .pep 
orf 66ng 
orf 66 .pep 
orf 66ng 



MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 

I I I : I I I M I 1 I I M I I I I I I I I I I I I I I I I I 1 I I : I ! I I I I II I M I I I II I I I I I 11 I 

MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

RIFGSHLARRI I FWVMFPALLLSYVFS VLFHNGSWTGLGALS EFNT FVGRI ALAS FAAYA 120 
I I I I I I I 1 I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I : I I I I I I I I I i I I I I I I I 

RIFGSHLARRI IFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 
: I M I I I I I I : I I I I I I I I I I I I 111111:1111 

LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 



35 The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGTACGCAT 
GCTTTTCCAT 
CCTTCCGGAT 
TTCATCTTCC 
GGCGCGGCGG 
aCGTCTTTTC 
ctgTCCCAAT 
CGCCTACGCG 
GCCGTCTGAA 
AATGCACTGG 
CGATGAATTT 
TGTTCAAACT 
ATACTGAATC 
GCAAGACCGC 



TGACCGCCGC 
ATCCTCATCA 
TTTCGGCATC 
TCGCCACCGA 
ATTATCTTTT 
CGTTTTGTTC 
TCAACACCTT 
CTCGGACAAA 
AGCGTGGTGG 
ACACGTTAGT 
ATGGCGGCAA 
TACCGTCTGC 
TGCTGACGAA 
CCCGTGCCCT 



AC AG C AAC AG 
TCGCCGCCAG 
CACACCACTT 
CCTGACCGTC 
GGGTGATGTT 
CACAACGGCA 
TGTCGGACGC 
T CCTTG AT AT 
ATTGCCCCGG 
ATTTTTTGCC 
ACTGGCAGGG 
ACCCTCTTCT 
AAAACTGACG 
CGCTGCAAAA' 



AAGGCACTCT 
CAACTATCTG 
GGGGCGCGTT 
CGCATTTTCG 
CCCCGCCCTT 
GTTGGACGGG 
ATCGCGCTGG 
TTTCGTATTC 
CCGCATCAAC 
GTTGCCTTTT 
CATCGCTTTT 
TCCTGCCCGC 
GCCCTGCAAA 
TCCGTAA 



TCCGGCTGGT 
GTGCAGTTCC 
TTCCTTTCCC 
GTTCGCACTT 
ttgCTTTcat 
CTTGGGCGCG 
CAAGTTTTGC 
GACAAATTAC 
CGTCATCGGC 
ACGCAAGCAG 
GTCGATTACC 
CTACGGCGTG 
CCAAACAGGC 



50 This encodes a protein having amino acid sequence <SEQ ID 266>: 

1 MYALTAAQQQ KALFRLVLFH ILI IAASNYL VQFPFRIFGI HTTWGAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PSQ FNTFVGR I ALAS FAAYA LGQILDIFVF DKLRRLKAWW IAPAA STVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGI AF VDYLFKLTVC T LFFLPAYGV 

55 201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 



60 



1 MYALTAAQQQ KALFRLVLFH ILI IAASNYL VQFPFRIFGI HTTWGAFS FP 

•51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSQFNTFVGR I ALAS FAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* • • 
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ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf66-l Pep MY AFTAAQQQKALFRLVLFH I LI I AASN YLVQFP FQI FG I HTTWGAFS FP FI FLAT DLT V 60 

| | | : | | | I | II I I I I I I t I I I 1 I I I t M M I I I i I : I I M I ( 1 I I I M M HIM 

orf66ng MYALTAAQQQKALFRLVLFHILI I AASNYLVQFPFRI FG IHTTWGAFS FPFI FLATDL f V 60 

orf 66-l pep RIFGSHIARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

| | M | | | | | M I II I I I I M I I I I M I I I I I I I I I I I : I M I I I I I I II I I I I M 

or f 6 6ng RI FGSHLARRI I FWVMFPALLLS YVFS VLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 

10 o-"F66-l pep I GQI LD I FVFNKLRRLKAWW I APT AST V IGN ALDTLVFFAVAFY AS SDG FMAANWQG I AF 180 

: | | | | || | | | : | I I I M I I II I I : I I I I I I I I I II I I I II I I I I I II I I I I I I 

orf66ng LGQILD I FVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDE FMAANWQG I AF 180 

orf66-l pep VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 229 
15 | M M I I I I I I I I I I I ! I I t M M I I I I I I : I 1 I I I I M I i : I II I I II 

orf66ng VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 229 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

sp|P37619l YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 

REGION (0221) . , . . A ^ r ^ 

20 >gi|1073495|pir| IS47690 hypothetical protein o221 - Escherichia coll >gi|466607 

(U00039) No definition line found (Escherichia colij >gi 1 1789882 (AE000423) 

hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 

Length = 221 
Score = 273 bits (692), Expect = 5e-73 
25 Identities = 132/203 (65%), Positives - 155/203 (76%) 

Query 1 MYALTAAQQQKALFRLVLFHILI IAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFS FPFI FLATDLTV 
Sbjct: 1 MN VFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVS I LGFHTTWGAFS FPFI FLATDLTV 60 

Query: 61 RI FGSHLARRI I FWVMFPALLLS YVFS VLFHNGSWTGLGALSQFNTFVGRI ALAS FAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

35 Query: 121 LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 
40 ' VDY FK+ + +FFLP YGV+LN 

Sbjct: 181 VDYCFKVLISIVFFLPMYGVLLN 203 

Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
45 N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 32 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 267>: 

1 " AT GG TC AT AA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

50 51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

55 301 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 

351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 

4 51 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
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501 TGGCTGCTAC GGCGTTGAT . . 

This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

5 101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

10 101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

15 351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 



1 MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
20 51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 
25 ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of//. 
meningitidis: 

10 20 30 40 50 60 

orf72 .pep MVIKYTNLNFAKLS IIAI LMMYSFEANA NAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I I ! I I I I I I M I I I M I I I I I I I I I I I I I I M M I I I ! I I i I M I I I I I I I I I I I I I I 
30 orf72a MVIKYTNLNFAKLS 1 1 AILMMYSFEANA NAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72 . pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAK FSTRAVPYVGTALLA 
35 I I I I I ! I I II M I I I I I I I 1 I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I II I I I 

orf 72a DLIKTVDLTH I PTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

' 70 80 90 100 110 120 



130 140 150 160 170 

40 orf 72 . pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

I I I I I I I I I I I I I i I I I I I I I I I I I : I 
orf 72c HDVYETFKEDIQARGYQYDPETDKFAKVSGX 
130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 271 > is: 

45 1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

50 251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

55 This encodes a protein having amino acid sequence <SEQ ID 272>: 
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! MVIKYTNLNF AKLSIIAILM MYSFEANA NA VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

5 ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

MVIKYTNLNFAKLS I IAILMMYS FEANANAVKI SET VSVDTGQGAK I HKFVPKNSKTYS S 

[ | | | 1 1 I I 1 I I t 1 I 1 I 1 I 1 I I I I 1 I 1 J I I t I I t I 1 I I I I I I I I 1 1 1 I I I I J I 1 I I 1 I 1 I I 
MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

I | | | M | | H | I | I M I I I I I M II I I I I I 11 I I I I I M II I I I I I I I I I I I I I I I I M I 
DLIKTVDLTH I PTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
70 80 90 100 110 120 

130 140 150 

HDVYETFKED I QARGYQYDPETDKFAKVSGX 
M I I t II I 1 I I II I I I I I M I I I I II I I M I 
HDVYETFKED I QARGYQYDPETDKFAKVSGX 
130 140 150 

Homology with a predicted ORF from TV 'gonorrhoeae 
25 ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 



orf72a.pep 
orf72-l 

10 



orf72a.pep 
15 orf72-l 



orf72a. pep 

20 

orf72-l 



30 



gonorrhoeae: 

orf72 peD MVIKYTNLNFAKLS I IAILMMYS FEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

M | : | | I I I I I I I M I I M I I I I I M I I I : I i I I I I I I I : I I I I i I : I : III 

orf72ng MVTKHTNLNFAKLS I IAILMMYS FEANANAVKI SET LSVDTGQGAKVHKFVPKSSN I YSS 60 

orf72 pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II 1:11111 I I I I I I I II I I I I I I I I I I I I I : I I I M : I INI: M I I I M 

orf72ng DLTKAVDLTHI PTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 120 

35 orf7 2 pep H DV YET FKEDI QARG YQY D PET DKFVKG YE Y SNCLW YEDKRR I NRT YG C YG VD 173 

* " | | | | | | | | I M I I I I : I II I I U I I I II I I : I I I II I I : M I I I I II I II M 

orf72ng HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

40 1 MVTKHTNLNF AKLSI IAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

45 251 FSLGRN PKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGN PVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAEN PANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

4 51 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

50 501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

55 151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

60 4 01 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 
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This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANA NA VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 

5 ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 

10 20 30 40 50 60 

orf 72ng-l . pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 

I I I : I I I II I I I I I I I I I i I I I I I ! I I I I I I I I M : I I I I M I I I : I I i t I I : I : ill 
orf 72-1 MVIKYTNLNFAKLSI1AILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72ng-l . pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 

II I : M II I I I I I I I I I I M I I ! I I I I I I I I I : I I I II : I I I I I : I I I II I I I II M I 
15 orf 72-1 DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 
orf72ng-l.pe HDVYET FKE DI QARGCRYD PET DKF 
20 I I I I I I M I M I ! I I : I I I I I I M 

orf 72-1 HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
25 domains in the gonococcal protein, it is predicted that the proteins from N.me?iingitidis and 
N. gonorrhoeae , and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 277>: 

30 1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 GCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC . . 

35 This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

40 51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

45 301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

4 01 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

4 51 TCCCGAAACG- CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

50 1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 
101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 
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151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from meningitidis f strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of//. 
5 meningitidis: 

10 20 30 40 50 60 

or n 3 pep MRFFGIGFLVLLFLEIMSIWVADWLGGGWTLFLMAAGFA AGVLMLRQTGLTGLLIAGAA 
I I i M I I I I I I I I I i M M I M I t I I M I I 1 I I I M I I I I I I : I I I : I I I : I I I I I I I I 
orf 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFA AGWMLRHTGLSGLLLAGAA 
JO " 10 20 30 40 '50 60 

70 

or f 7 3 . oep MRSGGKVSVYQMLWPI 
I I I I I : II I I II I i 

J 5 orf 7 3a MRSGGRVSVYXMLWXIRYTVAAVC XMSPGFVSSVXAVLLXL PFKGGAVLQAGGAENFFNM 

The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

20 151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

2 51 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

25 401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

4 51 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ID 282>: 

i MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

30 101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

orf 7 3a . oeo MRFFGIGFLVLLFLEIMSIWVADWLGGGWTLFIJ^TFAAGVVMLRHTGLSGLLLAGAA 
35 I I I M M I I I I I I I I I I t I I I M I I 1 I I II I II I I II I I I I I : I I I I I I I I I I I t M I I 

or f73-i MRFFG I GFLVLLFLE IMS I VWVADWLGGGWTLFLMAAGFAAGVLMLRHTG LSGLLLAGAA 

10 20 30 -40 50 60 

70 80 90 100 110 120 

40 orf 7 3a . oeo MR S GGRVS V YXMLWX I RYT VAAVCXM S PG FVS S VXAVLLXLP FKGGAVLQAGGAEN FFNM 

I M I I I I M I Ml MINIM! MINIMI INI I I I I I I I It I M I I M I I I I 
orf73-T MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

45 130 140 150 160 

orf 7 3a. pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 
) I I I I ( i I I I I I I I I M II I Mil I III I N II II 
orf 73-1 NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
130 140 150 160 

50 

Homology with a predicted ORF from N.zonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 

o**f7 3 .pep MRFFG IGFLVLLFLE IMS IVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 
55 ' I I I I I I I I I M II M I II I I II I M I I I II I II I M I I I I I I I II I : I I 1 : M I I I M I 
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orf73ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 60 

or f 7 3 . pep MRSGGKVSVYQMLWPI 7 6 

: : I : I I II I I I I I I I I 

5 orf7 3ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 120 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

10 151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

15 4 01 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

4 51 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSS VLAVLLL 

20 101 LPF KGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

orf 7 3-1 . pec MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 
25 * ' I I I I 1 I I I I I I I I I I I I I I I I I I I I I ! I I I I I M M I I I M I I I I I I I I I I I I I I I f f I 

orf7 3ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

30 orf 73-1 . per, MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

:: I : I : I I I I I I I I I I I I M I M I I I 11 I I I I I II I I I I I I I I I M I I I I ! I i II I I I I I 
orf7 3ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

35 130 140 150 160 

orf 7 3-1 .pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 
I I I j I I I I I : I M I I I I II I I I : I I I I I It I I I I I : I M I 
orf7 3ng NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 

130 140 150 160 

40 Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 34 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 285>: 



50 



55 



1 ATGTTTGTTT 

51 AGCCTCCGAC 

101 TCGGCAATTT 

151 GCG 

201 CGCGTACGGC 

251 GGCAGATGGC 

301 GCACAGGTTT 

351 ACTCGCCCGC 

401 GCGCAAC.GC 

4 51 GATTTTTATT 



TTCAGACGGC 
AGCGTCGTCG 
GGCGGACATT 

GCCGA 

ATTCAGGGCA 
GGACAAGATT 
CCGATGCGGG 
CGCGTGCGTG 
GGTGATGGCG 
TCAACGGTTT 



ATTCTT. ATG 
GAGGGACATT 
ACCCTGCGCG 
AGACACGCGC 
AACTCGTCAG 
GTCGGCTATC 
TACGCCGGCC 
AGGCCGGGTT 
GCTTTGAGCG 
TGTACCGCCG 



TTTCAGAAAC 
ATACGTGGTT 
CTTTGGCGGT 
GTTACCGCAC 
TGTGCGCGAA 
TTTCAGACGG 
GTGTGCGACC 
TAAAGTCGTT 
TGGCCGGTGT 
AAATCGGGAG 



ATTTGCAGAA 
GCCACGCCCA 
ATTGCAAAAG 
AGCTTTTGAG 
CACAACGAAC 
CATGGTTGTG 
CGGGCGCGAA 
CCCGTCGTGG 
GGAAGGATCC 
AACGCAGGAA 
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501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 

5 7 01 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

7 51 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT . . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 

10 1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A ...AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

15 251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD. . 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

20 151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

?5 4 01 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

30 651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 C AAAA CAT C A TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAA G AAAG CT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

35 This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

40 201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of jV. 
45 meningitidis: 

10 20 30 40 50 60 

orO 5 . peD MFVFQTAFXMFQKKLQKASDSWGGTLYWATPIGNLADITLRALAVLQKAXXXXAEDTR 

I I I II I M I ! I I I I I I I I I I I I I I I ! M I I I I I I I M I I I I I Mill 
or f 75a MFQKKLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 
50 10 20 30 40 50 

70 80 90 100 110 120 

orf 75 . peD VTAQLLSAYG IQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 
| 1 | | | I I I I I I I M M I I I I I II M I I I I I I I I I M I I I I I M I I I I II I I M I I I ! I II 
55 orf 7 5a VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 

60 70 80 90 100 110 

130 140 150- 160 170 180 

orf 7 5. pep RVREAG FK W P WGAXAVMAALS VA GVEG SDFYFNGFVP PKSGERRKL FAKWVRAAFP IV 
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I I II : I I I I I M I I I I! I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I i I I : I I I : I 
or f 7 5a R VRE VG F KW PWGA S AVMAAL S V A G V AG S D FY FN G FV P P K S G E RRK L FAKWVR VAF PW 

120 130 140 150 160 170 

190 200 210 220 230 240 

orf 75 . pep M FET PHR I GAALADMAE LFPERRLMLARE I TKT FET FLS GT VGE I QTALSADGDQSRGEM 

II I II I I I II : I I I I I I I I I I I II II I I I I I M I I I I I I I I I I I I I I 11 : I I I : I I I M I 
orf 7 5a MFET PHRI GATLADMAE LFPERRLMLARE I TKT FET FLSGT VGE I QT ALAADGNQSRGEM 

180 190 200 210 220 230 



250 260 270 280 290 

orf 75 . pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I ! I I I I I I I I I I 1 M I I I I I I I II I I I I I I I II I I II I I I I I 
orf 75a VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 

orf75a X 

The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

2 01 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

2 51 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

351 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 

4 01 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 

501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

7 51 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATCA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 290>: 



1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNIMKILTAE 



DSWGGTLYV 
GIQGKLVSVR 
RRVREVGFKV_ 
AKWVRVAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



ITLRALAVLQ 
IVGYLSDGMV 
AALSVAGVAG 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



ORF75a and ORF75-1 show 983% identity in 291 aa overlap: 



KADIICAEDT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



10 20 30 40 50 60 

orf 7 5a . pep M FQKHLQKAS DSWGGTLYWAT P I GN LAD I T LRALAVLQKAD 1 1 CAE DTRVTAQLLS AY 
I I I I 11 I I I II I I I I I I I I I I I I I I I I M 1 I I I II I I I I I I I II I I I I II I I I I II I I I I 
orf 7 5-1 MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 75a . pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGT PAVCDPGAKLARRVREVGFKV 
I I I I II I 1 I I I I I I I I I I I ! I I I I I I I I I I M II I I I I I I M-l I I I I I I I I I I I I : I I I I 
orf 7 5-1 G I QGKLVS VREHNERQMADK I VG YLS DGMWAQVS DAGT PAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 5a . pep VPW GAS AVMAALSVAGVAGSD FY E*NGFVPPKSGERRKL FAKWVR VAFPWM FET PHRI G 
I M I i I M I I II I I II I I I I I I I I I I I I I I I M I I I III I I I I I : I M : I I I I I I I I I I 
orf 7 5-1 V P WGAS AVMAAL S V AG VEG S D FY FNGFVPPKSGERRKLFAKWVRAAFP I VMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 5a . pep m AT LADMAE LFPERRLMLARE I TKT FET FL S GT VGE I QTALAADGNQSRGEMVLVLYPAQD 
I I I I I I I I I I M I I I I II I I I I I I I I I I I M I I I I I I I I I : I I I I I I I ! I I I | | | || I I I 
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orf7 5-l ATLADMAELF p E RRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

<5 or f 7 5a oep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

D ' l h i | | i | | | i | | | I | 1 I I 1 1 1 I I I 1 1 1 f 1 I I I 1 I I I I I M I I M I M I I 

orf75-l EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

10 Homoloev with a predicted ORF fr om hi gonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 

gonorrhoeae: 

orf7 5 Deo MFVFQTAFXMFQKHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKA AEDTR 56 

P P [ | | | | I I | | | | | M I I M I I I M I M I I M I I I I I I I M I I I I I I I M I Mill 

1 5 orf7 5ng MSVFQTAFFMFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTR 60 

nrf 7 5 oeo VTAOLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLAR 116 

' P * Ml I I 11*111 I II: Ml Ml MM MM I:: S = 1 I I I * I I I 1 I M Ml II MM I II II I 

orf75ng VT AQL LS A YG I QGRLV S VREHNERQMADKV I G FL S DGL WAQVS DAGT PAVCD PG AKLAR 120 

orf75 oeD RVREAG FKVV p VVGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 176 

* M | I II M M M I I I M II II M M I M II M M II I II I M M ! I I II It II I I : I 
orf7 5ng RV REAGFKV\'PWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPW 180 

orf7 5 oeD M^ET^HRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 

' P ' m Ml 1111:1111 MM Ml I I I I I I I I M I II M II I I I M : I M : II M M 

orf75ng MFETPHP.IGATLADKAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 240 

orf75 peo VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAE LAAKITGEGKKALYD 288 
30 | | | | M M M I M I II I M M I I H : I I I M I M M M II II M I I M I M 

orf7 5ng VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF75ng nucleotide sequence <SEQ ID 291> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 NSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

35 51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPG AKLAR RVREAG FK VV PVVGASAVMA ALSVA GVAES 

151 DFVFtvIGFV^P KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

20T ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKI LAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

40 301 - 

After further analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 £TTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

45 151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 G^TCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

50 401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 AT C AC G AAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

55 651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

60 This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 
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1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

2 51 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf 75-1 . pep MFQKHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

I I M I I I I I I I I t I I I I I I I M I I I I I I I I M I I I I I II I II I I M I I I II I I I I I I I I I 
orf7 5ng-l MFQKHLQKAS DSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 75-1 . pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

II I I : I I I I I I I I I I I I I I I : : I : I I II : I I I I M I I I II I I I I II I II I I I I II I II I I 
orf7 5ng-l GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 5-1 . pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 
I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I I II I I M 1 M I 1 I I I I I : I I M M I I I I 
orf7 5ng-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 5-1 . pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
I I I M I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I I I I I I I I 
orf75ng-l ATLADMAELFPERRIJ^LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

orf 75-1 . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
1 I I I I I I I I I I 1 I I I I : I I I It M I I II I I I I I I I M I I I I I I II i I I I I I 
orf7 5ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

sp| P45528 | YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi 1 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct : 2 KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 5 9 

Query : 64 GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R RE AG +WP+ 
Sbjct : 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 23 9 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 
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Based on this analysis, including the presence of a putative transmembrane domain in ; the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and ^gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GC. AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

IQ 7 01 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

7 51 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

// 

15 201 ELVRNQLEQG LRQEKARLKI DALLE EN GVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCC AAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

20 101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

2 01 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

2 51 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

30 ] GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

25 3 51 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

•4 01 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAAT CGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

30 601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

7 01 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 

35 1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVS FATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

40 251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 31aa overlap with an 
ORF (ORF76a) from strain A oiN. meningitidis: 

45 10 20 30 

orf7 6.pep MKQKKTAAAVIAAMLAGFAAXKA PEIDPAL 
M I M I t t 11 I I I I 1 ! I I I I I II I I I II I 
orf 7 6a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

50 // 

70 80 90 
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orf 7 6 . pep XELVRNQLEQGLRQEKARLKIDALLEENGVKPX 

I I I I M I I I I I ! I M I I I M II : I I I I I I I I I 
orf76a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GAT CATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGG CATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

4 01 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 
501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

5 51 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 
601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 
651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 
701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 



1 MKQKKTAAAV I AAMLAG FAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 



ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 



10 20 30 40 50 60 

orf 7 6a . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
II I M I I I I I M I I I I II J I I I ! II I I I I I I I I I I | II I | | | | | | I | | | M || | || ! || | 
orf7 6-l MKQKKTAAAV I AAM LAG FAAAKAPE I DPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

"70 80 90 100 110 120 

orf 7 6a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 
I I I M I I M M I I I II I I M I I I I I M M I I | I | | | | || | || | | | | | M I I I I I : I:: I 
orf 7 6-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELKKF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 6a. pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
M : I I I I M I I I M I I I I I I I II I I I I I I I M I M I II II I I I I I I I I I I II I I I I I I I I 
orf 7 6-1 Y E QQ I RM I KLQQVS FATE EEARQAQQLLLKGLSFEGLMKR YPNDEQAFDG FIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 7 6a . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEFCARLK 
I I M I I I I I I I I I M I M I 1 I II I I M I I I M I I I I I II I I I I I I I t I I I I 1 II I I I I I I 
orf 7 6-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 

. .. 250 • 

orf 7 6a .pep IDAILEENGVKPX 
I I I : I I I I I I I I I 
orf76-l I DALLEENGVKPX 

250 



Homology with a predicted ORF from N .gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 
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or f 7 6. pep MKQKKT AAAVT AAMLAG FAAXKAPE I DPAL 30 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 60 

// 

5 orf7 6.pep ELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 
orf7 6ng VTRN PVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

The complete length ORF76ng nucleotide sequence <SEQ ID 301 > is: 

1 ATGAAACAGA AAAAGACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

10 51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AGACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTGCAAAC 

201 TTTGGAAGTT T T G AAAAAC A GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

15 301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 GTTCGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTcgc 

20 551 agtttgCCGG TATGAACCGT GGCGACGTTA CCCGCAATCC GGTCAAATTG 

601 GGCGAACGCT ATTACCTGTT CAAACTCGGC GCGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGGC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAaga Aaacggtgtc 

7 51 AaacCGTAA 

25 This encodes a protein having amino acid sequence <SEQ ED 302>: 

1 MKOKKTAAAV I AAMLAG FAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVS FATE EE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LAS QFAGMNR GDVTRN PVKL 

30 201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 

10 20 30 40 50 60 

orf76-l .pep MKQKKTAAAV I AAMLAG FAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
35 I 1 I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I 1 I I I I II I I I :( I I I M I I i 

orf7 6ng MKQKKT AAAVI AAMLAG FAAAKAPE ID PAL VDTLVAQIMQQADRHAEQSQRPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 7 6-1 . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSEDELHKF 

I I I I I I ! I I I I I I I I I I I I I I 1 I II M I I I I I I I I II I I i I I I I I M I I I I I I I : I : : I 
orf7 6ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKI AEAS FYAEE YVRFLERS ETVSESALRQF 

70 80 90 100 110 120 

45 130 140 150 160 170 180 

orf 7 6-1 .pep YEQQIRMIKLQQVS FATEEEARQAQQLLLKGLS FEGLMKRYPNDEQAFDGFIMAQQLPE P 

II : I I I I I I I I I I 1 I I II 1 I II I I I I ! I I I I I I I I I I I I I I M II I 1 I I I I I I I I I I M I 
orf76ng YERQIRMIKLQQVS FATEEEARQAQQLLLKGLS FEGLMKRYPNDEQAFDGFIMAQQLPE P 

130 140 150 160 170 180 

50 

190 200 210 220 230 240 

orf 76-1 . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
I I I I I I : M I I M I I : I I I 1 I 1 I I I I I I I : I I I II I I I I I I I I I I I I I I I II I I I i M I 
orf7 6ng LASQFAGMNRGDVTRN PVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 

55 190 200 210 220 230 240 

250 

or f 7 6- 1 . pep I DALLEENGVKPX 
I I I I I I I 1 I I I II 

60 orf76ng I DALLEENGVKPX 

250 

Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 
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SDIP24327IPRSA BACSU PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi I 98227 | pir I I SI 52 69 
33K lipoprotein - Bacillus subtilis >gi 139782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] . . 

>qi|2?26124 | anl I PID I e32518 1 (Y14077) 33kDa lipoprotein [Bacillus subtilis) 
5 >gi[2633331lgnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 

Length - 2 92 
Score = 50.4 bits (118), Expect = le-05 

Identities - 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 

10 Query 70 VLKNRALKEGLDK DKDVQNRFKI AEAS F YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct : 53 VLTOLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 

Query 115 SA « LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

15 A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Query 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

DAG F Q+ E + + G+V+ DPVK Y++ K +E D 

20 Sbjct- 173 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 



25 



Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 

Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 



Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from Kmeningitidis and N. gonorrhoeae y 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.colU as described above. The 
30 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 OA shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

35 Example 36 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

40 151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

45 1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

14 01 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

14 51 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

50 1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

.1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ED 304; ORF81>: 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 
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51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

4 01 QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
5 501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

JO 151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

15 4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

20 651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

7 01 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

7 51 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 
801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

8 51 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 
25 903 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAAT G G AT AG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CC AT ACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

30 1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

12 01 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

130T TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

35 14 01 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

14 51 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 

40 1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALT FV I AALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

45 251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QM I QTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

50 501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 

55 10 20 30 40 50 60 

or f81 pep MK KSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLF ARNKVTRL 
1 I I I : : : I I I M I I I I I I II I : : I I I I I I I I I : I I M I I I I M I II I I I I I 1:111 
or f 81a MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAAK MAETFALTFVIAALYLF ARYKATRL 

10 20 30 40 50 60 

60 

70 80 
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10 



15 



20 



orf 81 .pep 
orf81a 

orf 81 .pep 
orf 81a 

orf 81 .pep 
orf 81a 

orf 81 .pep 
orf 81a 



LIAVFFAFSIIANNVH YADYQSWMT 
I I I I 1 I I I I I I I M I I I I I I I I : I 

LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 

100 



70 



80 



90 



110 



120 



// 



120 130 140 

QTVFEQLQKT PDGN WLFAYT S DHGQYVRQD 
MINIMI II M M I M I M M II II I I 
IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
280 290 300 310 320 330 

150 160 170 180 190 200 

IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
I I M I M M I I II II I II I I I I M I M I M M II I I II M II I I I I II I M I I M I I i I I 
IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
340 350 360 370 380 390 

210 220 230 

CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
M I If M M I I I II M I II I II I II M II I M 
CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 



The complete length ORF81a nucleotide sequence <SEQ ID 307> is: 



25 



30 



35 



40 



45 



i 

51 
101 
151 
2C1 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



ATGAAAAAAT 
CAGCGAAATT 
CAAAAATGGC 
CTGTTTGCGC 
GTTCAGCATT 
TAACGGGCAT 
GGCGCAGGGG 
CGTGTTGGAA 
CGCATTTTTC 
GTGCGTTCGT 
ATACAGCCGC 
GCGTGTTGCC 
CAGCCTGCTC 
GATTATGGGC 
GGCGCGAAAC 
CCGATTGTGA 
GCCCAGTTTC 
GCGGCGGCGA 
CAAATGATTC 
CTGGCTGTTT 
TCTACAATCA 
TTGTACAGCC 
GCCTTGCGAG 
CGTTGGGCTA 
GGCAACCTGA 
GGCGGAATAT 



CCCTTTTCGT 
GCTTATCGCT 
AGAAACGTTT 
GTTATAAGGC 
ATTGCCAACA 
TAATTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
ATCAAAGCCA 
GTATCAGTTG 
CAAGCAGAAT 
G AAAG C G AAA 
TTCGCCGTTT 
AACAAAGT T A 
TTTAACGTCA 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGTGA 
GTTTATCCGC 



TCTCTTTCTG 
TTGTATTCGG 
GCGCTGACAT 
AACGCGTTTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAACAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAAGGC 
GCGCGGCGCA 
TTGACCCAGC 
TTCCGCAGGC 
TACCGCATGC 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
ATCAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATGA 



TATTCGTCCC 
AATTGAAACC 
TTGTGATTGC 
TTGATTGCGG 
CGCGGTTTAT 
AAGAGATTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAAGATTCC 
AGTATTCAAA 
TTTGAAATTG 
TTTCGCAAGC 
TTTATGACGG 
CAACGGCTTG 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTT 
GCTATCTCGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TACTTACTGC 
TTACCGGCTG 
TGCGCTGTAT 
TGTTTTTCGC 
CAAAGCTGGA 
CGAAGTTGGC 
CGTTGTGGGG 
CGCCGTAAGA 
GAT G ATT T T C 
CCAAACCGAC 
TTTGTCGGAC 
TGTGTTCAAA 
ATATCGTCCT 
TTTGGCTACG 
CGATTTTAAG 
CAGTATCCCT 
GAACAAATCA 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCGCTGGTG 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACG 
GCGACGGCAA 



50 This encodes a protein having amino acid sequence <SEQ ID 308>: 



55 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MKKSLFVLFL YSSLLTASEI 



LF AR Y KAT R L_ 
GAGASMLDKL 
VRSFDTKQEH 
QPAPSRIGQG 
PIVKQSYSAG 
QMIQTVFEQL 
LYSPDKAVQQ 
GNLITGDAGS 



LIAVFFAFSI 



AYRFVFGIET 
IANNVHYAVY 



LPAAKMAETF ALTFVIAALY 



WLPALWGVLE VMLFCSLAKF 



QSWITGINYW 
RRKTHFSADI 



LMLKEITEVG 
LFAFLMLMIF 



GISPKPTYSR 
SIQNIVLIMG 
FMTAVSLPSF 
QKQPDGNWLF 
AANQAFAPCE 
LNIRDGKAEY 



IKANYFSFGY 
ESESAAHLKL 
FNVIPHANGL 
AYTSDHGQYV 
IAFHQQLSTF 
VYPQ* 



FVGRVLPYQL 
FGYGRETSPF 
EQISGGDIVD 
RQDIYNQGTV 
LIHTLGYDMP 



FDLSKIPVFK 
LTQLSQADFK 
KYDNTIHKTD 
QPDSYLVPLV 
VSGCREGSVT 



ORF81a and ORF81-1 show 77.9% identity in 524 aa overlap: 



10 20 30 40 50 60 

orf 81a . pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 
I I ! I : : : I I I I II II I M I I I II I I I II I M M I : 1 I II I I I M It II M I I II I : \ I I 
orf 8 1-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 " 40 50 60 
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70 80 90 100 110 120 

orfSla oep LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 

* P ) |M II MINI Ml M: I MINIMI I: Ml I: MINI II I 111:111111 

S orfBl-1 liavFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 81a Deo VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFS FGY 
10 | | | | M | I N It I I II N N I I I I N I I I I N I I N I I M I II I I 11 I I I I I If N II I! 

orf 81-1 VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

130 140 150 160 170 180 

190 200 210 220 230 240 

15 orf81a pep FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 

I | | M I II I : I I: I I I I I I I : I N II : I I I I II M I I I II I I N I I I N N N N 

orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

20 250 260 270 280 

orf 81a pep LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

| | : | | M I I I I I N I 11 I N I I II I I I I N I! : I I I II I I I I I I I II 
or f 81-1 LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



25 



orf 81a . pep 



or f 81a . pep 



orf 8^ -1 TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
30 " 310 320 330 340 350 360 

290 300 310 320 

IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

I I I I I I I I I 1 I 1 I 1 1 I I I I 1 I 1 I 1 I I I I I I I I 1 
35 orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF \ 

370 380 390 400 410 420 

330 340 350 360 370 380 

orf 81a pep AYTSDHGQYVRQDI YNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

40 | | | M I I II I II II I N II I I I I I N I I I I I I N I N I II I I II I I I I I N I I I I 

orf 81-1 AYTSDKGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

390 400 410 420 

45 orf 81a . pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

I | | || I I I I II I I I I N I I I I II II N I I I I I I I I I I I I I N II I 
orf 81-1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 

50 Homology with a predicted ORF from N gonorrhoeae 

The aligned aa sequences of ORF81 and a predicted ORF (ORF81.ng) from N. gonorrhoeae of the 
N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 

orf 81 .pep MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 60 
| I I I : : : I I I i II ! I II 11 II : : II N I I I I I : I N II II I : I N N I II I I :: I I 
55 orf81ng MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 60 

orf 81 .pep LIAVFFAFSIIANNVHYADYQSWMT 85 

I I II II I II : I I 11 N II I II I I I 
orf81ng L I AVFFAFSM I ANN VH YAV YQSWMTG IN YWLMLKE VTE VGS AGASMLDKLWLPALWGVAE 120 

60 // 

orf81 pep QT V FE QLQKT PDGNWL FAYT S DHG Q YVRQ D 433 

| | | I I I I I I II I I II N I I I I I I I I I N I 
orf81ng ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 433 

65 orf 81 .pep IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 

| | 1 | | I | f I 1 I I = I I I 1 I 1 t I I I I I I I 1 I 1 I I I I I I I I t 1 I I f I I 1 I 1 1 I I I I I I 1 1 I I I 
orf 81ng IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 
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orf 81. pep y CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

I I M i I I I I I M I I M I I I I I :! I I I M M I 
orf81ng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

351 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAAAAT CGGGCPAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

7 01 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

7 51 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 
801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG G AACAAAT C A 

8 51 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 
901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

9 51 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 
1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 
1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 
1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 
1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 
1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 
1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 
1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

13 51 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

14 01 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CT GAT T C AC A 

14 51 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 
1501 GGCAACCTGA TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

15 51 GGCGGAATAT GTTTATCCGC AATAA 

This encodes a protein having amino acid sequence <SEQ ID 310>: 



1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

4 51 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 



10 20 30 40 • 50 60 

orf 81ng-l .pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I I II I M M I I I I I I I I II I I M I I I : I II M M I : II I I I II II I I : : I I 
orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 

70 80 90 • 100 110 120 

orf 81ng-l . oep LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
I I I M II I I : I I I I I M ! I I I I I I I i I I I 1 I I I ! I If I I I I I I I I M I I I I M : I M I I 
orf 81-1 LIAVFFAFS 1 1 ANNVHYAVYQSWMTGINYW LMLKEVTEVG SAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 81ng-l.pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHG I SPKPTYSR IKANYFSFGY 
M | | M I I I I I I I I II II I ! I I I I I I II M M II I I I II M I 1 M I I I I I I I I 1 I I I I I I 
orf 81-1 VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



30 



orf81ng-l .pep 
orf81-l 

orf81ng-l .pep 
orf81-l 

orf81ng-l . pep 
orf81-l 

orf 81ng-l .pep 
orf81-l 

orf81ng-l.pep 
orf81-l 



190 200 210 220 230 240 

FVGRVLPYQLFDLSKI PVFKQPAPSKIGQGS IQN I VLIMGESESAAHLKLFGYGRETS PF 
M | | | | | M ! I M I : I t : II M I I I I t I I I I : I I M I I I ! I 1 I I M I 11 I M M I M M I 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 290 300 

LTRLSQADFKP I VKQS YS AGFMTAVSLPS FFNVI PHANGLEQ I SGGDTNMFRLAKEQG YE 

I | | M I I I I I I I I I M I I II I I ! I I M I ! I I I : » I I M I I I I I I M II I I I M I I I I I I I 
LTRLSQADFKP IVKQSYSAGFMTAVSLPSFFNAIPHANGLEQI SGGDTNMFRLAKEQG YE 

250 260 270 280 290 300 

310 320 330 340 350 360 

TY^YSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
| | | M I I I 1 I : I I I I I M I I I M I I I I I I I I I I I I I I I i I I I I I I 1 I I I I I I I I I I t: M 
TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

370 380 390 400 410 420 

I VLHQRG S HAP YGALLQPQDKVFGEAD I VDKYDNT I HKT DQM I QT VFEQLQKQP DGNWL F 
1 I I t I I M I I I I I I I I I I I I I I M I I 1 I M I I I i I I I I I II I I M I I I I I I 1 I M I I I I 1 
IVLHQRGSHAFYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

430 440 450 460 470 480 

AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

| | | | J] | 1 M | M II I I I I I M I I I : I M II I I I I I II I M I I i M I I I II I I I M I I I I 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480; 



35 



40 



45 



50 



55 



60 



65 



70 



490 500 510 520 

orf81nq-l pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 
M I M I I M I I I I M I I 1 I I I I I I 11 II M I I I I : I I M I I M I I 
orf 81- 1 LIKTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 

Furthermore, ORFSlng shows significant homology to an E.coli OMP: 

gi 1 1256380 (U50906) outer membrane adherence protein-associated protein [E. 
coli] Length = 547 
Score = 87.4 bits (213), Expect = 2e-16 

Identities = 122/468 (26%), Positives =198/468 (42%), Gaps = 70/468 (14%) 

Querv 25 VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS — RLLIAVFFAFSMIANNVHYAVYQ 81 

VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

Sbjct: 29 VFGITNLVASSGAHMVQRLLFFVLTILVVKRISSLPLRLLVAAPFVL-LTAADMSISLY- 86 

Query: 82 SWMT-- GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

S W T G ++ + EV A ML ++ PL A + L + 
Sbjct: 87 SWCTFGTTFNDGFAISVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVI IKYDV 141 

Query: 13 5 HFSADILFAFLMLMIFVRSF DTKQEHGISPKPTYSRIKAN — YFSFGYFVG 183 

+ L+L++ S D K ++ SP SR +F+ YF 

Sbjct: 142 SLPTKKVTGILLLIVISGSLFSACQFAYKDAKNKNAFSPYILASRFATYTPFFNLNYFAL 201 

Query 184 RVLPYQ— LFDLSKI PVFKQPAPSKIGQGS IQNIVLIMGESESAAHLKLFGYGRETSPFL 241 

+Q L + +P F+ + I VLI+GES ++ L+GY R T+P + 

Sbjct: 202 AAKEHQRLLSIANTVPYFQL SVRDTGIDTYVLIVGESVRVDNMSLYGYTRSTTPQV 257 

Query: 242 TRLSQADFKPIVKQSYSAGFMTAVSLP S FFNVI PHANGLEQ I SGGDTNMFRLAKEQG 298 

+q + Q+ S TA+S+P + +V+ H I N+ +A + G 

Sbjct: 258 E— AQRKQIKLFNQAISGAPYTALSVPLSLTADSVLSH DIHNYPDNI INMANQAG 310 

Query 299 YETYFYSAQA ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

++x++ S+Q+ +N A+ ++ ++ + Y G DE LLP + Q 

Sbjct: 311 FQT FW LS SQS AFRQNGTAVT S I AMRAMETVYVRGF DELLLPHLSQALQQ 359 

Query 356 — QGRHFIVLHQRGSHAPYGALLQPQDKVFGEADIVDK-YDNTIHKTDQMIQTVFEQLQK 412 

Q + IVLH GSH P + VF D D YDN + IH TD VFE L+ 

Sbjct' 360 NTQQKKLIVLHLNGSHEPACSAYPQSSAVFQPQDDQDACYDNSIHYTDSLLGQVFELLK- 418 
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Query: 413 QPDGNWLFAYTSDHG QYVRQDI YNQG — TVQPDSYIVPL-VLYSP 454 

D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 — DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae \ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

10 Example 37 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 31 1>: 

1 . . .ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

15 151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTG CCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

20 4 01 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

451 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A... 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

25 1 . . TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN . ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 

30 1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

35 251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

40 501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

7 01 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

45 7 51 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

' 901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 

50 1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYS1 DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

55 251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from JV . meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of N. 
5 meningitidis: 

10 20 30 40 50 

orf83 peD TL LLFIPLVLTX CGTLTGIIAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

IN : I I I I M Itlllll I I I I I I I I I II M I I I I II I I I I I I I I I M I I I I I I 1 
orf83a MKTLLXLI PLVLTA CGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEM PL SALKGRKAAL 

|0 " 10 20 30 40 • 50 60 

60 70 80 90 100 110 

orf8 3 pep yvSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
I | | | | | M | | | I M I I I II I I M I I M I I I II I M I I I I I I I I I I M I M I M I I I I I 1 I 
1 ^ orf 83a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

AJ 70 80 90 100 110 120 

120 130 140 150 160 170 

orf 83 Dep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
20 " " * I ! I I I I II 1 I I I I I M I I 1 I 1 I I I I I I I I I I M I I M II I I 1 I M M I M II I I I I N I 

orf8 3a T S LLN APAAALTKN SGRKGERS AGLS VNGTGDYRNETLLAN PRDVS FLTNL I QT V FY LRG 

130 140 150 160 170 180 

180 190 
25 orf83.peP IEWPPXYADTDVFVTVDV 

Mill! I II I I I I M I II 

o^f 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

30 l £TGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGG CGGACG CTACTCTATC GACGCACTGA 

35 251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA AT AC AG C T AC 

30 1 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG * 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

40 501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

45 7 51 CAATACGCCC TCTGGATGGG AC CT T AC AG C GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

8 51 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 

50 1 MKTLLXLI PL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

55 251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 

10 20 30 40 50 60 

orf 8 3a . pep MKTLLXLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
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I I I I I | { i I I i I I I I M I I I I I I M .1 I I I I I II 1 I I I I I I I I M I I I M I I I I I It I I I 
orf83-l MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 



70 80 90 100 110 120 

or f 83a . pep WSVMGDQGSGN I SGGRYS I DALIRGGYHNNPESATQYS YPAYDTTATTKS DALS SVTTS 
II I I I I I I I I I II I I I I I I I I I I I I I I M I I! I I I I I I I I I I I I I I I I I I I I I II M I I I 
orf 83-1 YVSVMGDQGSGNI SGGRYS I DALIRGGYHNNPESATQYS YPAYDTTATTKS DALS SVTTS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 8 3a. pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I I I I I I I I It I I I I I I M I I II I I I I I I I I M I I I I I I I I I I \ I I I I I I I I I I I I I I M I 
or f 83-1 TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 8 3a . pep IEWPPEYADT DVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
. I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I II I I M M I I I I : I I 
20 orf 83-1 IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

190 200 210 220 230 240 



5 

10 
15 



250 260 270 280 290 300 

orf 83a . pep TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
25 I I I t I I I 1 I I I I I I I I I I : I : I I 1 I I I I II I I II I II I I I I M I I I I I I I I II I I I I I I 

orf 83-1 TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 

250 260 270 280 290 300 



310 

30 orf 8 3a. pep D VGNE VI RRRKGGX 

I M II I I I I i I I I I 
O r f 8 3 - 1 D VGNE V I RRRKGGX 

310 



35 Homology with a predicted ORF from N. gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 
gonorrhoeae: 

orf 83 . pep TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
1111:111111 I I I II I I It I I I I I I I I I I I M I I I I I I I I I It I I I I I I I I M I 
40 orf 83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



58 
60 



45 



50 



orf 83 .pep 
orf 83ng 
orf 83 .pep 
orf 83ng 
orf 83 .pep 
orf 83ng 



YVSVMGDQGSGNI SGGRYS I DALIRGGYHNNPESATQYS YPAYDTTATTKS DALS SVTTS 118 
I I M 11 I I 1 I I I i I i I I I I I I I I 1 M I 1 II I I : I I I : I I I I I I I I I I I M I I 1 I I : i I I I 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 120 



TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I I I I M I II I I I I : I I I I I I I M 1 I It I I I I 11 I I I I I I I I I I I I M I I I I I I I I I 11 I 
TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 



IEWPPXYADTDVFVTVDV 
I I I I I 1 I I I I I I I I I 1 I I 
IEWPPEYADTDVFVTVDVFGTVRSRTEI 



178 



180 



197 



.HLYNAETLKAQTKLE Y FAVDRDSRKLLI APK 2 4 0 



The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 



55 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGAAAACCC 
ACTGACCGGC 
AGGAACTCGT 
TCCGCCCTGA 
CCAAGGTTCG 
TACGCGGCGG 
CCCGCCTATG 
AACCACTTCC 
ACAACGGACG 
GGCGACTACC 
CCTGACCAAC 
TACCGCCCGA 
GGCACCGTCC 



TGCTCCTCCT 
ATACCCGCCC 
CGCCGCATCG 
AAGGACGCAA 
GGCAACATAA 
CTACCACAAC 
ACACTACCGC 
ACATCGCTTT 
CAAAGGCGAA 
GCAACGAAAC 
CTCATCCAAA 
ATACGCCGAC 
GCAGCCGTAC 



CATCCCCCTC 
ACGGCGGCGG 
TCCCGCGCCG 
AGCCGCCCTT 
GCGGCGGACG 
AACCCCGACA 
CACCACCAAA 
TGAACGCCCC 
CGCTCCGCCG 
CCTGCTCGCC 
CCGTCTTCTA 
ACCGACGTAT 
CGAACTGCAC 



GTACTCACCG 
CAAACGCTTT 
CCGTCAAAGA 
TACGTCTCCG 
CTACTCCATC 
GCGCCACCCG 
TCCGACGCGC 
CGCCGCCGCC 
GACTGTCCGT 
AACCCCCGCG 
CCTGCGCGGC 
TCGTAACCGT 
CTCTACAACG 



CCTGCGGCAC 
GCCGTCGAAC 
AATGGACTTG 
TTATGGGCGA 
GACGCACTGA 
AT AC AG CT AC 
TCTCCGGCGT 
CTGACGAAAA 
CAACGGCACG 
ACGTTTCCTT 
ATCGAAGTCG 
CGACGTATTC 
CCGAAACCCT 
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651 TAAAGCCCAA 

7 01 AACTGCTGAT 

7 51 CAATACGCCC 

801 CTCAGACCGC 

851 CAACCGCCCA 

901 GATGTCGGCA 



ACCAAGCTCG 
TGCCCCTAAA 
TCTGGATGGG 
CTGATGGTCG 
AAACCGTCCC 
ACGAAGTCAT 



AATATTTCGC 
ACCGCCGCCT 
ACCTTACAGC 
ATTTCTCCGA 
GACTTCAAAC 
CCGCCGCCGC 



CGTCGACCGC 
ACGAATCCCA 
GTCGGCAAAA 
CATCACCCCC 
AAAACAACGG 
AAAGGAGGAT 



GACAGCCGGA 
ATACCAAGAA 
CCGTCAAAGC 
TACGGCGACA 
TAAAAACCCC 
AA 



This encodes a protein having amino acid sequence <SEQ ID 318>: 



10 



l 

51 
101 
151 
201 
251 
301 



MKTL LLLIPL VLTAC GTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

OYALWM GPYS VGKT VKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

DVGNEVIRRR KGG* 



1 5 ORF83ng and ORF83-1 show 97. 1 % identity in 3 1 3 aa overlap 



20 



25 



30 



35 



40 



45 



50 



55 



10 20 30 40 50 60 

orf 83-1 pep MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
I | | | | | I I I M I I II I I I I I 1 I I I I I I I M I I 1 I I 1 I I I I I I I I II I M I I II I i I I I I I 
orf 83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 83-1. pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
| I I I I I I I I M I II I I I 11 I I I I I I I I I I I M : I II : I I I II I M I I I I II 1 I I I : I I I I 
or^83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 83-1 peD T S LLN AP AAALTKN S GRKGER S AG LSVNGTGDYRNETLLANPRDVSFLTN LIQTVFYLRG 
I | ! M I I I I I I I i I : I M I I I I I I I M I M i i I I I I I I I I 1 I I I I I I I I 1 I I 1 I I I I I I I 
orf83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 8 3-1 .pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 
| i I I I I I I I { I I I I I I 1 I M I M I I 11 I I 1 M I II II M I I I I I I I M I I I I I I I II : M 
orf8 3ng IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 8 3-1 .pep TAAYESQYQEQYALWTG PYKVSKTVKAS DRLMVDFS DI T PYG DTTAQNRPDFKQNNGKKP 
I I I II I M I I I I M I II I : I : M ! I ! I I I M I I If II II I II M I II I I I I I ! I M I : I 
orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 

310 

orf 83-1. pep DVGNEVIRRRKGGX 
M I I I I M I 11 I I I 
o r f 8 3 n g DVGNE V I RRRKGGX 

310 

Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
319>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCG CAT G ATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

3 51 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 
4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 
551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 
601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 
651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 
7 01 aGGAAGAACC CGCAG CACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

7 51 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

8 01 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 
8 51 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AAT AT AT AG C AGGCTGTATA 
901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 
951 gaAAGAAGTG ACGGaGTTGA TGTGccaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGgCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

3 51 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 32 1>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 
4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 
551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 
601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 
651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 
701 AGGAAGAACC CGCAG CACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 
751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 
8 01 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 
851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AAT AT AT AG C AGGCTGTATA 
901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 
951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC AT AC AAAG AA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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BNSDOCID: <WO. 



1 MAEICLITGT 

51 HTYIETDAKK 

101 SAGSKIPENV 

151 KMGMRTLLEW 

201 KRSKWFYTLP 



PGSGKTLKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KICADDPVKM 
VIVLLIPVFV 



251 LPDKTEGEPV 
301 EGGRTGCACY 
351 AQQHSDRAQV 



NNGNLTADMF 
SHQGTALKEV 
ATLGGKP*QN 



SMMANDEMFK 
AHDMYEWIKK 
IDIFVLTQGP 
ASSAFSSIYT 
GLSYKMLSSY 
VPTLSEKPES 
TELMCKDYVK 
LMYDNWEERG 



PDENGIRRKV 
PENIGSIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYHIASN 
SAEVHTVNKV 
ESAATEQQAV 
RTFEYIAGCI 
ESQGQEVQQS 
VGSAN* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitid is (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of//. 
meningitidis: 

10 20 30 40 50 60 

orf 84 pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 
M | I I I ! I I I t I i t I i I i I I M I I i M I I I I I I : : I I I t I I M I I M I i I I I I I I I I I I I 
orf 84a MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 84 pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAG SKI PENVQWLNTHRH QG 
M I I I I M I 1 I I II I M I II II I I I II I I II I I I I I I I I I I I M I I I I I M I I I I I ! II I 
o-f8 4a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 8 4 pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
| | I I M I II I I I I 1 I I I I I I I M I I I I I M M I I I II I M I I I I I II I I I M I I I I I I I 
0^84 a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 



190 200 210 220 230 240 

o-f 84 .pep LDKKVYDLYXXAEVHTVNKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 
I i | | M | | I I M I I I I I I I I M I I M I I I I : I I I I I I I I I I I I M I I I I I I I I I I I I I 
or f 8 4 a LDKKVYDLYESAEVHTVNKVKRSKW FYTLPVIILLIPVFVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 64 pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
111111:111: I I M I I II I I It I I II I 1 I I I I I I II I I I I I I I I I I I I I I I I M I I : 
orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 84 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
I I I I | I I : I I I I I I I I II I : I : I I I I I : : I I I M I II I I I II I : : I I M I : I I I I I I 
orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

310 320 330 340 350 360 



370 380 390 

orf 84 . vbxj ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
I I I 1 I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I 
orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

370 380 390 

The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

2 51 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 AC AT C AG G G C ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC AT T AC C AC AT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
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501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AATGGTTTTA TACTCTGCCA GTAATAATAT TGCTGATTCC 

651 CGTTTTTGTC GGCCTGTCCT ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA TCAGGCAGTA 

7 51 TTTCAGGATA AAACAGAAGG CGAGCCGGTA AACAACGGTA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTGTA 

901 GAAGGCGGAA GAACCGGATG CACATGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAATT ACAAAGGAAA TGTGCAAGGA TTACGCAAGA AACGGATTGC 

1001 CGTTTAACCC ATATAAAGAA GAAAG CCAAG GGCGGGATGT CCAGCAAAGT 

1051 GAGCAGCACC ATTCGGACAG ACCGCAAGTT GCCACGTTGG GCGGAAAGCC 

1101 GTGGCAAAAT CTTATGTATG ATAATTGGCA GGAGCGCGGA AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 324>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

251 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGS AN* 

ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 

10 20 30 40 50 60 

^LAZICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

i i | | M ] | | | | M I II I M M I I I I II I I M I I I M I I I I II I ! I I II 1 I I I I I I M I I I 
MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 

10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGS IVI VDEAQDVWPARSAGSKI PENVQWLNTHRHQG 

I I t I | I I |{ | I I I I I 1 M I I I II II II I I I I I II I I I I I I I I I I I 1 I I I I I I I I I 

LPKSTDEQLSAHDMYEWIKKPENIGS I VI VDEAQDVWPARSAGSKI PENVQWLNTHRHQG 
70 80 90 100 110 120 

130 140 150 160 170 180 

IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
MINIMI II I I I I M I I I I I I I I I I I M II I I I 1 I I I M I M M I I I I I I I M I I I I 
IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 

M I I I I I I I I I I I I I I I I I I M I II I I I I I I I : I I I! 1 I I I M I I I I I I I I I I M I I M I 
LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

M I I 1 ! : II I : I I I I I I I I I I I I I I M I M I! M I I I I I I I I M M I I I M I I I 11 I I : 
ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

M I I I I I : I M I M I I M I : I : I 11 I I : : I II I I M I I I II I I : : II I I I : I I II II 
EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

orf 84a . pep ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
M I I I I I I I I I II I I : 1 I II f I I I M I I 1 I I 1 I I i 
orf 84- 1 AT LGGKPXQN LMY DNWEERGKP FEG I GGG WG S ANX 

370 380 390 



orf 84a . pep 
orf 84-1 

orf 64a . pep 
orf84-l 

orf 84a . pep 
orf84-l 

orf 84a. pep 
orf84-l 

orf 84a. pep 
orf 84-1 

orf 84a . pep 
orf84-l 
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Homoloev with a predicted ORF from N. g onorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 



10 



15 



20 



25 



30 



gonorrhoeae: 

orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 
orf 84 . pep 
orf 84ng 



MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 
M | I | | | [ I M | M I I I 1 I M I I M I I I M I M : : : I I I I M M II I M I I I : I M I t t I 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 



60 



60 



LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 

H | | | | | I | I I I I I I I I 1 : I : I I I M I I I I I I I I I i I M I I I I I I I I I I I I I I II 

LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 



120 



IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

I I I I I I I I I I I I I :: I I I I I : I I I I : I I I I I I I : I I I I I I I M I I I M I I I I 

IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 



180 



180 



LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 240 
| | | | | | | | | | | : | | M | I I M I I I I : t I I I : II i I : I I I I ! I I ! I : M I I M I I I I I I 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 240 



ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 300 

| | | | | | | | | || M I I I 1 I I I I I I I I I I I I I I 1 I Ml I I I I M I I I I I I I I I M I I M 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRT FEYIAGCI 



300 



EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

| | || | i j : | | | (I I I II M I I I I I I I M 1 I M I I I I I M M I I I I I I M I I I I I I I I I I I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGVVGSAN 395 
I I I I I I I I I I I I I I I I I I I I M I I I! I ! I 1 I I I I 
AT LGGK PQQN LMY DNWEERG K P FEG I GGGWG SAN 3 95 



360 



360 



The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 
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40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGGCAGAAA 
AAAAATGGTT 
ACGGCGTACG 
CACACCCACA 
ACAGCTTTCG 
tcggcgCAAT 
TccgCAGGTT 
GCATCAGGGC 
ATCAGAACTT 
AAAATGGGTT 
GGTAAAAATG 
AAGTTTATGA 
AAGCGTTCAA 
GCTATTTGTC 
AGGAAGAACC 
CTTCCGGATA 
AGATATGTTT 
ATAACGGTGT 
GAAGGCGGAA 
GAAAGAAGTG 
CGTTTAACCC 
GCGCAGCAAC 
GCAGCAGAAC 
AAGGAATCGG 



TCTGTTTGAT 
TCCATGATGG 
CCGTAAAGTA 
TAGAAACAGA 
GCGCATGATA 
CGTTATTGTC 
CGAAAATCCC 
AT AG AT AT AT 
GCGAACATTG 
TGCGTACCCT 
GCATCAAGTG 
CTTGTACGAA 
AATGGTTTTA 
GGTTTGTCTT 
CGCAGCACAA 
AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACGGAGTTGA 
ATACAAAGAA 
ATTCGGACAG 
CTAATGTACG 
CGGGGGCGTG 



AACCGGCACG 
CAAACGATGA 
TTTACGAACA 
CGCAAAGAAG 
TGTATGAATG 
GATGAGGCGC 
CGAAAACGTC 
TTGTATTGAC 
GTTAAAAGAC 
GCTTGAATGG 
CATTTTCCAG 
TCCGCAGAAA 
TGCATTGCCC 
ACAAAATGTT 
GAATCGGCGG 
AGAATCGGTG 
TGCCCGAAAA 
AGGACCTTTG 
CACCTGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
GGCGCAAGTT 
ACAATTGGGA 
GTCGGATCGG 



CCCGGTTCAG 
AATGTTTAAG 
TCAAAGGTTT 
CTGCCGAAAT 
GAT C AAGAAG 
AAGACGTATG 
CAATGGCTGA 
ACAAGGTCCT 
ATTACCACAT 
AAAGTATGCG 
TATCTACACA 
TTCACACGGT 
GT CAT CAT AT 
GGGCAGTTAC 
CAACAGAACA 
AATAACGGAA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
CTATGTAAAA 
GGCAGGAAGT 
GCCACCTTGG 
AGAACGCGGG 
CAAACTGA 



GGAAAACATT 

CCAGATGAAA 

GAAGATACCG 

CAACCGATGA 

CCTGAAAacg 

GCCCGCACGC 

ACACACACAG 

AAACTCTTAG 

TGCGGCCAAC 

CGGATGACCC 

CTGGATAAAA 

AAACAAAGTC 

TATTGATTCC 

GGAAAAAAAC 

GCAGGCAGTA 

ACCTTACGGC 

AAGCCGATTT 

AGGCTGTATA ' 

GGACGGCATT 

AACGGCTTGC 

TCAGCAAAGC 

GCGGAAAACC 

AAACCGTTTG 



This encodes a protein having amino acid sequence <SEQ ID 326>: 



60 



MAEICLITGT 
HTHIETDAKK 
SAGSKIPENV 
KMGLRTLLEW 
KRSKWFYALP 



1 
51 
101 
151 
201 

251 LPDKTEGESV 
301 EGGRTGCTCY 
351 AQQHSDRAQV 



PGSGKT LKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KVCADDPVKM 
VIILLIPLFV 



NNGNLTADMF 
SHQGTALKEV 
ATLGGKPQQN 



SMMANDEMFK 
AHDMYEWIKK 
IDIFVLTQGP 
ASSAFSSIYT 
GLSYKMLGSY 
VPTLPEKPES 
TELMCKDYVK 
LMY DNWEERG 



PDENGVRRKV 
PENVGAIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VKRHYHIAAN 
SAEIHTVNKV 
ESAATEQQAV 
RTFEYIAGCI 
ESQGQEVQQS 
VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 

10 20 30 .4 0 50 60 

orf 84-1 .pep MAEICLITGTPGSGKTLKWSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
I I It I I I I II M I I I I t I I I I I 1 I i I t 1 I i I I I I I : I I I I I i I I I I I I I I I I :! I I I I I I 
5 orf84na MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 84-1 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
10 I I I I I I I I I I I I I I I II I I I I I I : I : I I I I I ! I I I I f I I I ! M II I I I II I M I I I i I I I 

orf84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



130 140 150 160 170 180 

15 orf 84-1 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

I I I I I I I I I I I 1 I I I I M I II : : I I I I I : I II I : I I I I M I : I I II I I I I I I I II I I II I 
orf 8 4ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

20 190 200 210 220 230 240 

orf 84-1 . pep LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
I I I I I I I I I I i I I : II I I I I I I I I I I I : I I II : I I I I : II I I I I I I I : I I I II I I I M I I 
orf8 4ng LDKKVYDLYE SAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 

190 200 210 220 230 240 

25 

250 260 270 280 290 300 

orf 84-1 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
I I I M II I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I II I I I I I I I I I 
orf84nq ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 
30 250 260 270 280 290 300 



310 320 330 340 350 360 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
I i I I I I I : I I I I N I II I I I I M I I I I I I I I I I I I I II I I I I I I I I I I II M I I I I I II I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
310 320 330 340 350 360 

370 380 390 

ATLGGKPXQN LMYDNWEERGKPFEGIGGGWGSANX 
t I I I I M I I II I I I I 1 I II I I I I I I 11 I ! I I I I II 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGVVGSANX 
370 380 390 

Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
45 double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



35 



orf 84-1 . pep 
orf 84ng 



40 



orf 84 -1 . pep 
orf 84nc 



Example 39 



The following partial DNA sequence was identified in N. meningitidis <SEQ YD 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 

50 51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 

101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 

151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TT G AC AT T C A 

2 51 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

55 301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

4 01 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

4 51 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA AT AT AAAAAC TATATGCTGC 

60 551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC . 
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60! TTGCAGCAGC AATACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

701 GCAAACGTCT . GTTGCCGAC GCAACCAAAG GCGCACCTGC CGAAATCCGC 

7 51 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

5 801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

IQ 1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C.GGTCCGCT 

1101 TTTGGTCTAT CTC. . . 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERT I RVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASRE PWLKA 

15 ioi TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG Y F YEML YG VM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

20 351 SEVRSSGLQM TRSXGPLLVY L. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

25 151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

30 401 AACGTTATCT GGAAGTACAA GGTTTTCAGG G AAAAAC C AT TAACCGTGAA 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAG7ATTTT 

35 651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 
801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 

8 51 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
40 901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

45 H51 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA" TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

50 14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGT CCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

55 1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

60 1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

65 51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 
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101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



REMKSFREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 
PEWQQDEARN 
PGALLVYLGS 



KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



SSLLDVKIAP 
AHVALIVICL 



EVAKRYLEVQ 
GGLIDSNLLL 



QKEFPKHVES LQRLGKDLNH 



NLSFRGNVNI 
PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 
D* 



SEGQSADWF 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



GFQGKTINRE 
KLGMLTGRIV 
LNADNGILVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
MSSARSERDL 



15 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of N. 



meningitidis: 
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25 



30 



35 



40 



45 



50 



55 



60 



orf 88 .pep 
orf 88a 

orf 88 .pep 
orf 88a 

orf 88 .pep 
orf 88a 

orf 88 .pep 
orf 88a 

orf 88 .pep 
orf 88a 

orf 88 .pep 
orf 88a 

orf 88 . pep 
orf 88a 

orf88a 



10 20 30 

MVFLNADNGILVQDLPFEVKLKKFHIDFYN 
: I II I i M I I M I I M 1 I I M I I I I I I I I I 
AKDFKPE S ILGASNLS FRGNVN I SEGQS ADWFLNADNG I LVQDLPFE VKLKK FHI DFYN 
210 220 230 240 250 260 

40 50 60 70 80 90 

TGMPRDFAS DIEVTDKATGEKLERT I RVNHPLTLHG IT I YQASFADGGSDLTFKAWNLGD 

I I I I I [ II I I II M I II I M I I I M I I I I I I I I I I I I I I I I I I I i I I I I I I I M I I I I II 
TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITI YQASFADGGSDLTFKAWNLGD 
270 280 290 300 310 320 

100 110 120 130 140 150 

ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

I I I I I I I I I I I I I I I I I I I I M I 11 I I I 1 I I I M I i I I I I I I I I M I I I I M I I Mill 
ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 
330 340 350 360 .370 380 

160 170 180 190 200 210 

TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 
1111:1111 I I I I I I If M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 1 I I 

TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 

220 230 240 250 260 270 

PLDKQLKADT FMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNI FAQKGYL 
I I I I I I M I I II I I II I I I II I I I II I I I I 1 II I I I II II 1 I I I I I I I I I I I ! I M I I I 
PLDKQLKADT FMALREFLKDGEGRKRLVADATKGAPAE I REQFMLAAENTLN I FAQKGYL 
450 460 470 480 490 500 

280 290 300 310 320 330 

GLDEFITSNI PKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 
I I I 1 I II II I I I I I M I I I I I I I I 1 I I I M I I I I I I I I I I I II I I I M I II I I I I II I 
GLDEFITSNI PKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 

340 350 360 370 

DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 
I I I II I I I I I It I I I I I I I I I II I I I I I I I M I I I I I I I 

DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPGA LLVYLGSVLLVLGTVLM FYVREKR 
. 570 580 • 590 600 610 620 

AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 



The complete length ORF88a nucleotide sequence <SEQ ED 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

4 01 AACGTTATCT GGAAGTACAA GGTTTTCAGG G AAAAAC CAT TAACCGTGAA 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGTGA TATTGAAGTA ACGGATAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTGGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTTACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

12 01 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

12 51 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 
1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

13 51 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 
14 51 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 
1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 
1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 
1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 
1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 
17 01 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

17 51 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

18 01 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 
18 51 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 
1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 
1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 
2 001 CTTGAATCAT GACTGA 

This encodes a protein having amino acid sequence <SEQ ED 332>: 

1 MSKSRRSPPL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFG FLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

2 51 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 IVYRIRD'AAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PGA LLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKE FPKHVES LQRLGKDLNH D* 

ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 

MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

1 M I | I I I I I I II II I I I I I I M I M I I I I I I I I I I I I M 1 I I I I I I I I I I ! I I ! 1 I M I 
MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

QI FGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKS FREKVKEKSLAAMRH 120 
I I | || I M I I I I I I II I I M I 1 I II I I I I I I I I M I I I I I I I M I I I I I II I I I II I II I 
QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKS FREKVKEKSLAAMRH 120 

SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 
| | | | I M I I I I I I M I 11 I I I 1 I I I I M I II I I I I I I M M I I I I M I I II I II I II I I I 
SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNI SEGQSADWF 240 



orf 88a . pep 
orf88-l 
orf 88a . pep 
orf88-l 
orf 88a . pep 
orf88-l 
orf 88a . pep 
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orf86-l 
orf 88a . pep 
orf88-l 
orf 88a .pep 
orf88-l 
orf 88a .pep 
orf 88-1 
orf 88a . pep 
orf 88-1 
orf 88a . pep 
orf88-l 
orf 88a . pep 
orf88-l 
orf 88a . pep 
orf88-l 
orf 88a . pep 
orf88-2 



I | U | | I I I I I I II I I II It I ! I I M I I I I I 1 I I I I I I I I I ! i I I M I I I I M I II I I I I 
GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 



240 



LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 
I I I I I I I I I I I II I I I M I I I I I I I I I I I I M II I I I I I I I I I I I I I I I M I I I ! M I I I 
LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I II I I I I I I I I I I I ! I I II M I II I I I I I I I I I I I I I I I I I I I I 1! I I I I I I I I I 
LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 
I I I I I I II I I I I II I M M I I I I I I I I I I I I I I M I I I I I I I II I I M I I I I I I I II I I I 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 4 20 

PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALRE FLKDGEGRKRLVADATK 4 80 
I I I I I I I I I I I I I I I I II I I I II t I I I I I I I I I I I I I I I M II I I M I M I I I I I I II I I 
PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALRE FLKDGEGRKRLVADATK 4 80 

GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 54 0 
I I I I I I I I I I I I I I I I M I I I I I II II I I II I M I I I I I I I I I I I I I M I I II I I II I I I 
GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 54 0 

LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 
I M I I I I I I I I I I I I I I I I M I M I I I I I I II I I I I I II I I II I I I I M I I I I I II I I I I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFS DGKIRFAMSSARSERDLQKEFPKHVES 660 

1 1 1 1 1 1 1 1 i 1 1 1 1 1 i i M 1 1 1 1 1 1 M i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 



LQRLGKDLNHD 
I I I I I I II II I 
LQRLGKDLNHD 



672 



672 



Homology with a predicted ORF from N. gonorrhoeae 

ORF88 shows 93.8% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 
gonorrhoeae: 
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50 
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orf 88 . pep 
orf 88ng 
orf 88 . pep 
orf 88ng 
orf 88 . pep 
orf 88ng 
orf 88 . pep 
orf 88ng 
orf 88 . pep 
orf 88ng 
orf 88 .pep 
orf 88ng 
orf 88 . pep 
orf 88ng 



MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 
I M I I I I I I : I I I M M II I I I I I I I I I I I I I I I I II I I I I I 11 I I 1 I I I II I I I I I I I I 
MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 



60 



60 



120 



PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 
I I I I I I I I I I I I I I I I I I I I I MM I I I M M M I M II I M I M M II M I M I I I I I 
PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 



QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 
M I I M I II I I II I I I I I I I M I I II I 11 I II 1 : I I I I II I I I I I I I I 1 I II 

QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 



180 



180 



240 



YMLPVLQEQDYFWITGTRSXLQQQYRWLRI PLDKQLKADTFMALREFLKDGEGRKRXVAD 
I M I : M : : M II M I II I I I I II I I M I I II I II M II II I M M I I II I M M III 
YMLPILQDKDYFWLTGTRSGLQQQYRWLRI PLDKQLKADT FMALREFLKDGEGRKRLVAD 24 0 



ATKGAPAE IREQFMLAAENTLN I FAQKGYLGLDE FITSN I PKEQQDKMQG YFYEMLYG VM 
Mi M M I M II I I I II I 11 I M M I M I II II M M 1 I I 1 I M II M I I I II II 1 I I 
ATKDAPAE IREQFMLAAENTLN I FAQKGYLGLDE FITSN I PKGQQDKMQGYFYEMLYGVM 



300 



300 



360 



NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 
M M I 11 II II M I I 11 I I II M II II I M I I M I I I II I 1 11 M M II I I II It II I 
NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

TRSXGPLLVYL 371 
M I I II I I I 

TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 420 
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An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVFLNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

4 01 RFAMSSARSE RDLQKEFPKK VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 

1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

3 51 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

4 01 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 
4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 
501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 
55 T TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 
601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 
651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 
7 01 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 
801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

8 51 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 
951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

105T AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC TGCACAGTAT 

17 01 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2 001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORE88ng-l>: 

1 MSKSRISPTL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADVVF LNADNGMLVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLV A DAT K DAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PG ALLVYLGS VLLVLGTVFM FYVREKRAWV LFS DGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 
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orf 88-1 .pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf88-l.pep 
orf 88ng-l 
orf 88-1 .pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 
orf 88-1 . pep 
orf 88ng-l 



MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 

Mill 11 I I I I I I I I M M ! I Ml I II I 1 I I 1 I I I I I 1 I 1 M II I I I I II I I I I II: 
MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 



60 



60 



QIFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 
:|| | I I M I I M I I 1 I I I I M I ! I I M I I I I I I I I I I I I I M II I I M I I I I I I t M I I 
RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 



SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 
I I M I I I I M I M I II I I I : I I I M I :: I I I I I M I I I M I I I I I I I M I I : I I I I II M 
SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 



180 



180 



GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 24 0 

I | M 1 | I | I I I I I I I : M I I I! I II I I I II I I I I I I I I I 1 I 1 I I I I I I I I I I I i M II M 
GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 
I M I I I :'l I I I I I II II I I I I I I I I I I I I I II i I I I I I I I I 1 I I I ! I I M 1 I I I I I I I I 1 
LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVT DKATGEKLERTIRVNHPLT 300 

LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I i I I I 1 I I I I I I I I I I M I I I I I I I I I I I I I I I I II I M 1 I I I I I I I I I 1 I I I I I I I I I 
LHGITIYQASFADGGSDLTFKAWNLRDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 4 20 

I I I I I I I 1 I I I I I I I M I M I I I I I I I i I I It I I I I i I I I I I I I I I I I 1 I I I I I 1 I I I I I 
SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 
I : I I :: I I M : I I f I I I I i I I M I M 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 1 I I I 
PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 



GAPAEIREQFMLAAENTLNI FAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVWNAA 
! M I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M 1 I I 
DAPAE I REQFMLAAENT LN I FAQKG YLG LDE FI T SN I PKGQQDKMQG Y FYEML YGVMN AA 

LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

| | M I I M I I 1 I I I I 11 M I 1 M I I I I I 1 I I I I 1 I I I I I I I I I M I I II I I I It I I M I I 
LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 

PGALLVYLGS VLLVLGTVLMFYVREKRAWVLFS DGKIRFAMSSARSERDLQKEFPKHVES 
| I M I I I I I II I I! I I I I : I I I I I I I I I I I I I I I I 1 I I II i I I M I I I 1 II I I I I I I I I I 
PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFS DGKIRFAMSSARSERDLQKEFPKHVES 



540 



540 



600 



600 



660 



660 



LQRLGKDLNHD 
1 I I! I I I I I I I 
LQRLGKDLNHD 



671 
671 



55 



60 



65 



Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi 12984296 (AE000771) hypothetical protein {Aquifex aeolicus] Length = 537 
Score =94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps = 59/334 (17%) 

Query: 16 FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 7 4 

+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

Sbjct: 80 YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

Query: 7 5 AWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 134 

++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 
Sbjct: 140 WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 

Query: 135 — RYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 

++L +GF+ V E * + + A+KG ++ G +AL+VI G LID 
Sbjct: 198 VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 24 9 
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Query: 193 GMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+ I+G RG++ ++EG + DV+ + A+ L 

Sbjct: 250 AIVGV RG S L I VAEG DTN DVMLVG AE — QKP YKL 280 

Query 253 P FE VKLKKFH I DFY NTGMPRDFA SDIEVTDKATGEKLER — TIRVNHPLT 300 

PFVLFIY N+ + FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNEPFD 337 

10 Query: 301 LHGITIYQASFA — DGGSDLTFKAWNLRDASREP 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
15 be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 

337>: 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

20 51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

25 301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 



30 1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 



ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 
GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 
AT C AAAGTT A TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 
GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 
CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 
AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 
GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 
GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 
AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 
GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

45 This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMSNKMEQKG FT L I EMM I W AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

50 Computer analysis of this amino acid sequence gave the following results: 
Homology with PilE of N. gonorrhoeae (accession number Z6926C0. 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
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orf8 9 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y + S+ G + ++L + + 

Pi IE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

orf89 67 -DDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVAS S DKIKGKYVQS VTVAKGVVTAEMASTGVNKE I QGKKLS LW 115 

Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A of N. 

meningitidis: 



15 



20 



25 



10 20 30 40 . 50 60 

orf 8 9. pep MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

I I I I I I I 1 I I I I I I II 1 I I ! I I I I I I I I I I ! I I I II I I I I I I I I 

orf 8 9a MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9. pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
I I 1 ! I 11 I I I I I : : I I I ! I I I I I I M I I II : I I : I i I : I I : : I I III I I I II I : I I I I 
orf 8 9a ILKNPLDDNQTIKSKLEI FVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 8 9 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSS DVGCEAFSNRKKX 
I I I II I I I I I I I I I I II I I M : I I 1 I I I I I I I I I I I I I ! I M I 
orf 8 9a TLSVWMNSVGDGYKCRDAASARAHLETLSS DVGCEAFSNRKKX 

130 140 150 160 

The complete length ORF89a nucleotide sequence <SEQ ID 341> is: 



30 



35 



101 
151 
2C1 
251 
301 
351 
401 
451 



ATGATGAGTA 
NATNGNCNTC 
ATCNNAGTTA 
GTCGGTATCA 
CGATAATCAG 
AGATGAATCC 
AATGAGGAAA 
GACGGGTTAT 
AATGCCGTGA 
GATGTCGGCT 



ATAAAATGGA 
GCGATACNCN 
TATTGAAAAA 
ACAATATTTC 
ACCATCAAGA 
GAAAATTGCC 
AACCNAGGGC 
ACTTTGTCGG 
TGCCGCTTCT 
GTGAAGCCTT 



ACAAAAAGGG 
GCNTTANCAG 
GGCTATCAGT 
CAAACAGTNT 
GCAAACTGGA 
GAAAAATATA 
ATACAGCTTG 
TATGGATGAA 
GCCCGAGCCC 
CTCTAATCGT 



TTTACATTGA 
CGTCATTNCN 
CCCAGCTTTA 
ATTTTGAAAA 
AATATTTGTC 
ATGTTTCGGT 
GTCGGCGTTC 
CAGCGTGGGC 
ATTTGGAGAC 
AAAAAATAG 



TTGNGANGNT 
ATNNNTNCNT 
TACGGAGATG 
ATCCCCTGGA 
TCAGGCTATA 
GCATTTTGTC 
CAAAGACGGG 
GACGGATACA 
CTTGTCCTCA 



40 This encodes a protein having amino acid sequence <SEQ ED 342>: 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA .EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK + 

45 ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 



50 



55 



60 



10 20 30 40 50 60 

orf 89a . pep MMS N KMEQKG FT L I XXXXXXAI XXXX S VI XXXX YX S Y I EKG YQSQL YTEMVG I NN I SKQX 

I I M M I I I M I I I II Ml I I 1 I I I I I I I I I M I I I 1 M I I I I I 

orf 8 9-1 MMSNKMEQKG FTLIEMMIWAILGI I SVIAI PS YQSYIEKGYQSQLYTEMVG INN I SKQF 

10 20 30 40 50 60 

70 80 90 i00 110 120 

orf 8 9a. pep I LKNPLDDNQT I KSKLE I FVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSL VGVPKTGTGY 
I II I I I I I I I I I : : I I I M II I II M I M I : I I : I I I : I I : : M Mi M M II : I I I I 
orf 8 9-1 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

70 80 . 90 100 110 120 

130 140 150 160 

orf 8 9a. pep T LSVWMN SVG DGYKCRDAAS ARAHLETLS S DVGCEAFSNRKKX 
It II I M I II I I I I I I I I I I I M M I I II M I I I II I I M M I 
orf 8 9-1 • TLSVWMNSVG DGYKCRDAASAQAHLETLSS DVGCEAFSNRKKX 
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130 140 150 160 

HnmnWv with a predicted ORF fr om N gonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 
5 gonorrhoeae: 



orf89 



MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 
Ml, | || MM! 1111:1111 I MMI III I I I I M I I I I I I I I I Mil: III 
o-f89ng I^SNKMEQKGFTLIEMMIVVTI LGI I SVIAI PSYQS YIEKGYQSQLYTEMVGINNVLKQF 60 



10 orf89 



ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 



120 



Mill I M : I ::: 11 : I I I I I H I I I I I I I I I I I I I : I II II II 1 II M M : I I I I I 
orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 



orf89 



orf89ng 



TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

15 | M M M I 11 I I M I I I I: M I I : M II : I M M M M I I 

TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 



The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 
51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 
70 10 n ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

15 J CTCGGTATCA acaatgttct caaacagttt attttgaaaa atccccagga 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

3Q1 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

">< ?m GACGGGTTAT ACTTTGTCGG T AT G GAT G AA CAGCGTGGGC GACGGATACA 

40i AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

4 52 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMqNyMF.O KG FTLIEMMTW TILGIISVI A IPSYQSYIEK GYQSQLYTEM 
on 5 -, VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

l0 l DAE K PRAY RL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
35 identity in 162 aa overlap: 

10 - 20 30 40 50 60 

orf89-l Pep MMSNKMEQKGFTLIEMMIWAILGI I SVIAI PSYQS YIEKGYQSQLYTEMVGINNISKQF 
* P * M I II II I UIMM II III Mill II II Mill MMI II I U II II I II I Ml: Ml 
0 rf89nc MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 
40 " 10 20 30 40 50 60 

70 80 90 100 110 120 

or^89-' Pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
"* P P MMI IMM::MIMMMMMMHMMMI:III II MIMMMMIMI 
45 c-f8 9nc ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 

~ 70 80 90 100 110 120 

130 140 150 160 

orf89-^ pep T LS VWMN SVG DG YKCRDAAS AQ AHLET LS S DVGCE AFSNRKKX 
50 * 11 II I I I M II I I I M II : I M I : M I I M 11 M I I II 1 M 

orf89nq TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 

130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their 
55 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 11A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., confirming that 
5 ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

10 101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG . GC AAA 

2 51 AACAAGCGTT GGCCn.AGAA TTTCAACCC . . . 

This corresponds to the amino acid sequence <SEQ ED 346; ORF91>: 

15 1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP . . . 

Further work revealed the complete nucleotide sequence <SEQ ED 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

20 101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

25 3 51 CATCGTCAAT AAAGGCGGCA AAG AAAT CAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT C AAAG C G AAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

30 This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

35 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of AT. 

meningitidis: 

.10 " 20 30 -40 50 60 

40 orf 91 .Dep MKK S S L I SAL G I G I L S I GMAFAAP ADA V S Q I RQN AT QVLS I LKN G DAN T ARQKAE A YAI P 

I I i I I : 1 1 I M I i I I I II II I i I I I I I t : I I I I I 1 II I I II It : M I I I I I I M I I I I I I 
orf 91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 

10 20 30 40 50 60 

45 70 80 . . 90 

orf 91 .pep YFDFQRMTALAVGNPWXTXS DXQKQALAXE FQP 

I I I I I I I 1 I I 1 I I I I I I I I MINI I I i 
orf 91a YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKN AN VNVKDNPIVN 

70 60 90 100 110 120 
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_ rfqia KGGKE 1 1 VRAE VG V PGQK P VNM D FTT YQ S GGK YRT YNV AI E GAS LVT V YRNQ FGE 1 1 KAK 

130 140 150 160 170 180 

The complete length ORF91 a nucleotide sequence <SEQ ID 349> is: 

5 1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

10 251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

15 501 CGTGTACCGC AACCAATTCG GCGAAATTAT C AAAG C G AAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

! MKK SSFISAL GIGILSIGMA FA APADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

20 101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGAS LVT VYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf91a pep MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 
25 " ' * M I II : II 1 M I I M I I M I I 11 I I II I : 1 It! M I ! I I II II : I I i I II I I M M Ml I 

orf 91-1 MKKS SL I S ALG IG I LS I GMAFAAPADAVSQ IRQNATQVLS I LKNG DANTARQKAEAYAI P 

10 20 30 40 50 60 

70 80 90 100 110 120 

30 orf 91a p«p Y FD FQRMT ALAVGN PWRTA S DAQKQALAKE FQTLL I RT Y S GTMLKLKNAN VNVKDN P I VN 

I { | | M II I I M I I II I 1 I I I t I I 11 I II M I II I I I I I I I I I I 1 I M I I 

orf 91-1 YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLLIRTYSGTMLKLKN AN VNVKDNPIVN 

70 80 90 100 110 120 

35 130 140. 150 160 170 180 

O-f 91a peD KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
I I | I 1 | I | I I I I I I I I I I M I I I I I I I I I I I 1 M I I I I I I I I M I H I I I I M I I I I I I I 
O r f 9 1 - 1 KGGKE 1 1 VRAE VGV PGQK PVNMD FTT YQSGGKYRT YNVAI EGAS LVTVYRNQFGEI IKAK 

130 140 150 160 170 180 

40 

190 

orf 91a. oeo GVDGLIAELKAKNGSKX 
I I II I I II II I I II : II 
orf91-l GVDGLIAELKAKNGGKX 
45 190 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

50 orf 91. pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

: i | | | : I | M I I I I I I I I I 11 I : t I M t : I I I I M I I II : I I I : M I : M I I I I I I : I 
orf 91ng VKKSSFISALGIGILSIGMAFAS PADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

or f 91 . pep Y FDFQRMT ALAVGN PWXTXS DXQKQALAXE FQ P 93 
55 I I I I I M II I II I I I I I M MINI III 

orf 91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 
having amino acid sequence <SEQ ID 352>: 
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1 VKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

5 Further work revealed the complete nucleotide sequence <SEQ ID 35 3>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGAAAAAAT 
CGGCATGGCA 
ACGCCACACA 
CGCC CAAAAG 
GACCGCATTG 
AACAAGCGTT 
GGCACGATGC 
CATCGTCAAT 
TCCCCGGTCA 
GGCAAATACC 
CGTGTACCGC 
GGCTGATTGC 



CCTCCTTCAT 
TTTGCCTCCC 
GGTTTTGACC 
CCGAAGCCTA 
GCGGTCGGCA 
GGCCAAAGAA 
TGAAATTCAA 
AAGGGCGGCA 
GAAGCCCGTC 
GTACCTACAA 
AACCAATTCG 
CGAGTTGAAA 



CAGCGCATTG 
CGGCCGACGC 
ATCCTCAAAA 
TGCGGTTCCC 
ACCCTTGGCG 
TTTCAAACCC 
AAACGCGACC 
AGGAAATCGT 
AATATGGACT 
CGTCGCCATC 
GCGAAATCAT 
GCCAAAAACG 



GGCATCGGTA 
AGTGGGACAA 
GCGGCGACGC 
TATTTCGATT 
TACCGCGTCC 
TGCTGATCCG 
GTCAACGTCA 
CGTCCGTGCC 
TTACCACCTA 
GAAGGCACGA 
CAAAGCCAAA 
GCGGCAAATA 



TTTTGAGCAT 
ATCCGCCAAA 
GGCTTCTGCA 
TCCAACGTAT 
GACGCGCAAA 
CACCTATTCC 
AAGACAATCC 
GAAGTCGGCA 
CCAAAGCGGC 
GCCTGGTTAC 
GGCATCGACG 
A 



20 



This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 

1 MKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 



25 



30 



35 



40 



10 20 30 40 50 60 

orf 91-1 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I I I I : I If I I I I M I I I I II I : I t I I I : I I I 1 I I I I I I : I I I : I I I : I I I I I I M : I 
orf 91ng-l MKKSS FI SALGIGI LS IGMAFAS P AD AVGQ IRQNATQVLT I LKSGDAASARPKAEAYAVP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 91-1 . pep Y FD FQRMT ALAVGN PWRTAS DAQKQALAKE FQT LL I RT Y SGTMLKLKN ANVN VKDN P I VN 
I M I I I I I I I I II I I I I ! II I I I I II II II i I I I I I I I M I M I I : I II : I I I I I I I I I I 
orf 91ng-l YFDFQRMT ALAVGN PWRTAS DAQKQALAKE FQTLLIRTYS GTMLKFKNAT VNVKDNPIVN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 91-1 .pep KGGKEI IVRAEVGVPGQK PVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEI IKAK 
I I I I I I : II M I I : I I I I II I It I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I M I I I I 
orf 91ng-l KGGKEI WRAEVGI PGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 



45 



orf 91-1 . pep 
orf 91ng-l 



190 

GVDGLIAELKAKNGGKX 
I : I I I I I I II I I I I I I I 
GIDGLIAELKAKNGGKX 
190 



In addition, ORF91ng-l shows homology to a hypothetical E.coli protein: 



50 



55 



60 



sp I P4 5390 I YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi I 606130 (U18997) ORF_f211 [Escherichia coli] 
>gi 1 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score =70.6 bits (170), Expect = 6e-12 

Identities - 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 5 9 VPYFDFQRMTALAVGN PWRTAS DAQKQALAKE FQTLL I RTYSGTMLKFKNATVNVKDNP I 118 

+PY. + AL +G +++A+ AQ++A F+L + Y + + T + P 
Sbj ct : 65 LPYVQVKYAGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA — PE 122 

Query: 119 VNKGGKEI V-VRAEVG I P-GQKPVNMDFTTYQSG- -GKYRTYNVAI EGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ ' G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 
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Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
5 protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 42 

The following DNA sequence was identified in N. meningitidis <SEQ ED 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

10 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC T C AAAAT AC A GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTT CGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

15 301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

20 1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 

51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 357>: 

25 1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

' 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

30 251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

'4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

35 This corresponds to the amino acid sequence <SEQ ED 358; ORF97-l>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

40 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of//. 
meningitidis: 

10 20 30 40 50 60 

45 orf 97. pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 

I I I I 1 I I 1 I I M I I II I I I I M : I I I I II I I M I I I I M : : I II II I 

orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

I H | | | M II I I I I I I I I I I I I I I I I I M II 1 M 1 I I I I I M I I I I I I I I I I I i I I I I 1 
orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

5 70 80 90 100 110 120 

130 140 150 160 

or f 97 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I : I I I 
10 or f 9 7 a VRAAYTDTRALIAGSRIGFDE VANTLANAEKLI QKT I GEX 

130 140 150 160 

The complete length ORF97a nucleotide sequence <SEQ ID 35 9> is: 

1 ATGANACACA TACTCCCCCT GANTGNCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGNN CATCCTGCCA GCGAACCGCA AACCCAAAAC GAAACCGCTA 

15 101 TGACCACGCA TACCCTCACC T C AAAAT AC A GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCC AT AAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GTACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 

20 351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACT G AT AC AAAAAACCAT AGGCGAATAA 

This encodes a protein having amino acid sequence <SEQ ID 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

25 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 

10 20 30 40 50 60 

30 orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

I Mill MINIUM I M II I : II M II M 1 II I II II I M I II II I M M M II 
orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKS'KG 

10 20 30 40 50 60 

35 70 80 90 100 110 120 

orf 97a . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 
I I I I I I I I I I I I I I I I I I I I M I I I I! I II I I I I I I I I I I I 11 I M I I I I I I I I II I II 
o r f 9 7 - 1 MDI FAV I DHQEAARRNGLTMQPAKV I V FGT PKAGT PLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 

40 

130 140 150 160 

orf 97a. pep VRAAYTDTRALI AGSRI GFDEVANTLANAEKLI QKT I GEX 
I I I I I I II 1 I I I I I I I I M II I I M I I I I I I I M I I : M I 
orf 97-1 VRAAYTDTRALI AGS RIGFDE VANTLANAEKLI QKT VGEX 

45 130 140 150 160 

Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
gonorrhoeae: 

50 orf 97 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 

MUM M I I I : I M I II I I I I :: I II II I I I MM MM! : : I I I II I 

orf 97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

orf 97 .pep MDI FAV I DHQEAARRNGLTMQPAKV IV FGT PKAGT PLMVKDPAFALQLPLRVLVTETDGK 120 
55 * " I I I I I I I I M I II I I I I t M I I I* I I I I I M 1 I I M I M I M II I II I I 1 II I If I I I II I 

orf 97ng MDI FAVI DHQEAARRNGLTMQPAKVI VFGTPKAGT PLMVKDPAFALQLPLRVLVTETDGK 120 

orf 97 .pep VRAAYT DTRAL I AG SR I GFDE VANTLANAEKLI QKT VGE 159 
I I : I II I M I 1 !: II I I : I I M 11 I I I I II I I I I I I I I I 
60 orf97ng VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ID 361 > is predicted to encode a protein 
having amino acid sequence <SEQ ED 362>: 

l MKH ILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
5 101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 

1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

1A 101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

15 351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

4 01 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACT GAT AC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 

1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
20 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

25 o-f97-l pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

| | | 1 I I ! I M I I I I I 1 I I I 1 I I I I I II I M I I I I I I M I I I I I H I i 

orf97nq-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 

30 70 80 90 100 110 120 

or ^97_l pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I | i M | | | | 1 I I | ! || I I I I II I I II I I M M I I I ! I I M I I 11 I I I I M I I I I I I I I I I 
orf 97nq-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 ■ 100 110 120 

35 

130 140 150 160 

orf 97-1 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
| | : I I I I I I I M : M I I : I I I I I I I ! II I I I I I I I M I I I 
orf 97ng-l VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGEX 
40 130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fusion 
proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 

The following DNA, believed to be complete, sequence was identified in N, meningitidis <SEQ ID 
365>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

-51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT AJRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 59 

orf 106 . pep MAFITRLFKS SK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLS I SSRFQTELPDQ 

I I I I I I I I I I I II:: II : : : : I I I I M II I M I I I : 1 I I I I I I I I II I I I I I 
orf 106a MAFITRLFKS IKQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 

II I III II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I | : | | | | | | | | 
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10 



orfl06a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 106 Deo FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

Ml i II I I t I I I I i M I I I I I I I II M I I I I I I ! I M I I I I M I II Ml II I I I 

orf 106a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

130 140 150 160 170 180 

180 190 199 

or f 1 0 6 . pep SQNWHLDSGWKPLN I IGNKX 
I I I I I I I I I I II II I II I M 
orfl06a S QNWHLDSGWK PLNII GNKX 

190 200 

15 Due to the K-*N substitution at residue 1 1 1, the homology between ORF106a and ORF106-1 is 
87.9% over the same 199 aa overlap. 

The complete length ORF106a nucleotide sequence <SEQ ID 369> is: 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGGCTTTTA 
GCTGCCGATG 
TGAGCCGCGC 
AGCCGCTTCC 
GGGCGTGNCG 
TCGCTTCTTA 
ATTGACTACA 
CGTCGGCGCG 
CGACCGGCGC 
GGTGCGGAAG 
TTCAAAACTG 
GGCATTTGGA 



TTACGCGCTT 
CTTTCCGTTT 
CGAAGCGAGG 
AAACCGAGCT 
CTCAACTNTA 
TCGGTTTNAA 
AACTGAGTTT 
TTTTCGACAG 
GGTTGCCAAC 
CAGGGGAAAC 
CCCAAGCCTT 
TTCGGGTTGG 



ATTCAAAAGC 
TGCCGGACGC 
ATAANCGACG 
GCCCGACCAG 
CCTTAAGNTG 
TTGGGGCAAC 
CCATCCGCTG 
ANTACGACAC 
TGGAAAGTCC 
CAAGGCGGAA 
TTCAAATCAA 
AAACCTCTAA 



ATTAAACAAT 
GGCGGCGGAG 
GCGGGCAGCT 
CTCCAANNNG 
GCAGCTTTCC 
TGATTGGCGA 
ACCAACCGCT 
CTTGGATGCG 
TGAACAAAGG 
ATCCGCCTGA 
TGCATTGACT 
ACATCATCGG 



GGCTTGTGCT 
GGGATAGATG 
TTCCATNAGN 
CGNNGNGCCG 
GCCCCGATAA 
TGACGACNAT 
ACCGCGTTAC 
GCATTGCGCG 
CGCGCTGTCC 
CGCTGTCCAC 
TCTCAAAACT 
GAACAAATAA 



30 This encodes a protein having amino acid sequence <SEQ ID 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLS FHPL TNRYRVTVGA FSTXYDTLDA AL RAT G A VAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 



35 



Homoloev with a predicted ORF from N gonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 



gonorrhoeae: 



40 



45 



50 



orf 106 . pep 



orf 106ng " 
orf 106 . pep 
orf 106ng 
orf 106 .pep 
orf I06ng 
orf 106 .pep 
orf I06ng 



MAFITRLFKSSK-WLIVPLMLPAFQNVAAEG I DVSRAEARITDGGQLS IS SRFQTELPDQ 59 

| | I | | | | | | 1 | ||:: : ! : : :.: I I I I I :: 1 I I I I I I I I I : I M I I I I I I I I I I I 
MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLS IS SRFQTELPDQ 60 



LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 119 

|| 1 I I 1 I I I I I I M I I I I I It I I I I I I I I I I I I M I I I I I I I H I 1 I I I I : I I I I I I M 
LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 



120 



FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 17 9 

I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I III I I I 

FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 180 

SQNWHLDSGWKPLN I IGNK 198 
I I I I I II I M I I I I I I I M 
SQNWHLDSGWKPLNI IGNK 199 



Due to the K-^N substitution at residue 1 1 1, the homology between ORF106ng and ORF106-1 is 
55 91 .0% over the same 199 aa overlap. 
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The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

5 151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

10 4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

15 1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
20 protein, it was predicted that the proteins from N. meningitidis and N .gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF 106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
25 results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF106-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 44 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
30 373>: 

1 AT GG AC AC AA AAGAAATCCT CGG . TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GT CAT CATC c TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

35 201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

2 51 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

40 4 51 CTCGCCATCC TGCTGCTG.T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 AC CG AT C G C A CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

45 701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG 'AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

50 951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG . TGCCGC 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



CGCTGTTTTG 
CGCAAAACGC 
CCTGCTGCTG 
CGGCGGTTGC 
GAAAGCTCyT 
GCACACATTG 
CGCCGGCAAA 
GGCTGCATCC 
GAAAAAACAA 



CACGCTGGCG 
GCCCGATCGC 
CTGGGGCTTG 
CTGTGCCGCC 
GCCGCCTGTG 
TTCTGCCTGA 
CTATCCCCTG 
TGCGCCACCG 
GGTTTCCCAT 



GAAATCAGCG 
GCTCGCCACC 
ACCGTGCCGT 
TCATTCTGGC 
GCAGCCGCTC 
CCTCCTCGGC 
TTTGCCGGCG 
GAAAGATTTG 
TATGA 



GCATCGGTTT 
TTGGGCGCGC 
ACCGGCGAGG 
TGTTTTTTGC 
AAACGCCTGC 
GGCCTACACC 
TATGGGCGGC 
CACAAACTGT 



GAACGTCGTT 
TGGCGGCAAA 
CCGCC.GGCG 
CTTCAAGACC 
CGCTTTATCT 
TGCTTCGGCA 
ATATCTGGCA 
TTCATTATTT 



10 This corresponds to the amino acid sequence <SEQ ID 374; ORF10>: 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDTKEILXYA 
SVLCLGLDQA 
SEILFSLDDA 
LAILLLXPLT 
HAPFSPAVLH 
MGISFGGAAL 
ALCXTGIFSP 
RKTRPIALAT 
ESSCRLWQPL 
GCILRHRKDL 



AGSIGSAVLA 
YVREYYATAD 
AAGIGLVLFE 
VGLLHFPANT 
RGXRYGIPIA 
LFQSIFSTWJ 
LASLLLPENY 
LGALAANLLL 
KRLPLYLHTL 
HKLFHYLKKQ 



VIILPLLSWY 
KDTLFKTLFL 
LSFLPIRFLL 
AVLTAVYALA 
LSSIAYWGLA 
TPYIFRAIEE 
AAVRFIWSC 
LGLDRAVPAR 
FCLTSSAAYT 
GFPL* 



FPADDIGRIV 
PPLLSAAAIA 
LVLRMEGRAL 
NLAAAAFLLF 
SADRLFLKKY 
NAPPARLSAT 
MXPPLFCTLA 
PXGAAVACAA 
CFGTPANYPL 



LMQTAAGLTV 
ALLLSRPSLP 
AFSSAQLVPK 
QNRCRLKAVR 
AGLEQLGVYS 
AESAAALLAS 
EISGIGLNW 
SFWLFFAFKT 
FAGVWAAYLA 



Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 



25 



30 



35 



40 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
601 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



ATGGACACAA 
GGTTTTAGCC 
ACGACATCGG 
TCGGTGTTGT 
CACCGCCGAC 
TGTCTGCCGC 
TCTGAAATCC 
GCTGTTTGAA 
GTATGGAAGG 
CTCGCCATCC 
AGCGAACACC 
CCGCCGCCTT 
CACGCACCGT 
ACCGATCGCA 
GTTTGTTCCT 
ATGGGTATTT 
AACGGTCTGG 
CCGCCCGCCT 
GCCCTCTGCC 
G G AAAACT AC 
CGCTGTTTTG 
CGCAAAACGC 
CCTGCTGCTG 
CGGTTGCCTG 
AGCTCCTGCC 
CACATTGTTC 
CGGCAAACTA 
TGCATCCTGC 
AAAACAAGGT 



AAGAAATCCT 

GTCATCATCC 

GCGCATCGTG 

GCCTCGGGCT 

AAAGACACCT 

CGCGATAGCC 

TGTTTTCACT 

CTGAGCTTCC 

ACGCGCCCTT 

TGCTGCTGCT 

GCCGTCCTGA 

TTTGCTGTTT 

TTTCGCCCGC 

CTGAGCAGCA 

GAAAAAATAT 

CGTTCGGCGG 

ACACCGTATA 

CTCGGCAACG 

TGACCGGCAT 

GCCGCCGTCC 

CACGCTGGCG 

GCCCGATCGC 

CTGGGGCTTG 

TGCCGCCTCA 

GCCTGTGGCA 

TGCCTGACCT 

TCCCCTGTTT 

GCCACCGGAA 

TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAC 
TCGCCTATTG 
GCCGGCCTGG 
GGCGGCATTA 
TTTTCCGCGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTATCGT 
GAAATCAGCG 
GCTCGCCACC 
CCGTGCCGTC 
TTCTGGCTGT 
GCCGCTCAAA 
CCTCGGCGGC 
GCCGGCGTAT 
AGATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCT GAA 
CGGGGGCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGTTCCAAA 
AATCGAAGAA 
CCGCCGCCCT 
CTTGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCGCG 
TTTTTGCCTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGCATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCCCCGCCG 
GCTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGCCCAAG 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCCCGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGCCGC 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACCGAA 
TTTATCTGCA 
TTCGGCACGC 
TCTGGCAGGC 
ATTATTTGAA 



55 



60 



This corresponds to the amino acid sequence <SEQ ED 376; ORF10-1>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

5 1 SVLCL GLDQA YVREYYATAD KDTLFKTLF L PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA, AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQL VPK 

151 LAILLLLPLT VGLL HFPANT A VLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AESA AALLAS 

301 ALCLTGIFSP L ASLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 rktrp ialatTgalaanlll LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AG VW AA Y LAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 



BNSDOCID: <WO 992*57BA2J_> 



WO 99/24578 



PCT/IB98/01665 



-238- 



10 



Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 

Homology with EpsM from Streptococcus thermophilus (accession number U40830). 
ORF1 0 shows homology with the epsM gene of S. thermophilic, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LR YG I PLALS S LAYWGLASADRL FLKKYAGLEQLGVYSMG I S FGGAALLLQS I FSTVW 270 

L Y + PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LYYALPLI PSS ILWWLLNASSRYFVLFFLGAGANGLLAVATKI PS 1 1 S IFNTI FTQAW 267 



15 



20 



25 



Identities = 15/57' (26%), Positives = 31/57 (54%) 
Query: 



30 



35 



40 



45 



50 



55 



Sbjct: 
Identi ties 



7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 
L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 



= 16/96 (16%), Positives = 36/96 (37%) 



Query: 307 IFSPLASLLLPEKYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 3 66 

+ F+ +~ +YA+ V ML LF + ++ G ++T + + 

Sbjct: 305 VLKPI VEK"v^SSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSI YGTIV 364 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORFlOa) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

or f 10. pep MDTKEILXYAAGSIGSAVLAVI ILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

I ! I I i I I I M I I I II j I I M I I II I I i M I i I I I I I I I I I I I I I I M ! I I I I ! I I I! I I 
or f 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I It I I I I : I I I I I I I I 1 II I I I I M I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I 
orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10 .pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I II II I I I I I I I I I I I ! 1 I II I I I I t I I I II II II I I I I I I I I I I I I I I I M 
orf 10a LSFLPIRFLLLVLRMEGRALJi.FSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10 . pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
I I II I I I I M I I I I I I I I I I : I I I I MINI I I I I I I I I I I I I i M I M I I I I I 11 I I 
orf 10a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10. pep AG LEQLG VY SMG I S FGG AALL FQS I FS T VWT P YI FRAI EENAP PARLS AT AE S AAALLAS 
I I I I I I I 11 M l I I I I I I I I I I M I 11 I I I I -I I M I I I I II I I I I I I I I I || M | I I 1 I 
or f 1 0a AGLEQLGVYSMG I S FGGAALLFQS I FSTVWTPYI FRAI E AN AP PARLS AT AES AAALLAS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orflO pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
Ml M I I I M I II I II I M I I I I ! I M I I I I I I I I I I : I M I I I II I I I I I I I I I I M 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 
5 310 320 330 340 350 360 

370 380 390 400 410 419 

orf 10 pep LGALAAN LLLLGLDRAV PAR- PXGAAVACAAS FWL FFAFKTE S S CRLWQPLKRLPLYLHT 
I | | | | | I | | | M I Ml: I I I I I I I I I I I I I I : II I M I I I I I I I I I I I I I I : I I 
10 orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

o-f 10 pep LFCLT S SAAYT C FGT PAN YPLFAGVWAAYLAGC I LRHRKDLHKLFH YLKKQG FPLX 
15 ^ | | M : | | | | ! I I I I I I I I M I I I i 1 I I : I ! 1 M I M I I I I I I M I I I I I I I I I I I I 

orf 10a L FCLAS S AAYT C FGT PAN Y PL FAG VW A VYLAGC I LRHRKDLHKLFH YLKKQG FPLX 

420 430 440 450 460 470 

The complete length ORFlOa nucleotide sequence <SEQ ID 377> is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

20 51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 

101 ACGACATCGG ACGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

2 51 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 

25 301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

30 551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

35 801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGCA AACGCCCCGC 

8 51 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

40 1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

12 51 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 
45 1301 CGGCAAACTA CCCCCTGTTT GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 

13 51 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 378>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

50 51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAP PARL SAT AESAAALLAS 

55 301 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAAN LLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

4 01 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

4 51 CILRHRKDLH KLFHYLKKQG FPL* 

ORF 10a and ORF 10-1 show 95.4% identity in 475 aa overlap: 

60 10 20 30 40 50 60 

orf 10-1 .pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
t I I ( I I I I I I 11 I I I II I I I I I I I I I I I I I II I i I I II M I I I I I I I I I M I I I II I I I 
orf 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

65 
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10 



15 



20 



25 



30 



35 



40 



70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
i M I I I I : I I I I I I M I I I M M I M I I I ! I I I I M I I I I ! I I I I I I I I I I I II It I I I I 
orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
orf 10a LSFLPIRFLLLVLRMEGRAIAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1 . pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGI PIALSS IAYWGLASADRLFLKKY 
I I II I I I I I I I I I I I I I I 1 I : I I I I Mill! I | I I I M | | | f I I I I I | | I I I I I I I I I 
orf 10a NLAAAAFLLFQNRCRLKAVRRAP FS S AVLHRGLRYG I PI ALS S IAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . oep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
1 I I II I I II 1 I II II I I II II II I I II I I I I I I I ! II I I I I I I I I I I I II I 1 I I I I I i I 
orf 10a AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 ' 350 360 

orf 10-1 .peD ALCXTG I FS PLASLLL PEN YAAVRFI WSCMXP PLFCTLAE I S G I GLNWRKTRP I ALAT 
1 I I II II II I II I I I I I I I I I I I I I I II I I I I II I I I : I I I M I I I I I I 1 M II I I I I 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 419 

orf 10-1 . pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
I I I I I I I I I I I I I Ml: II II I I I I I I I I I I : I M I I I M I I I I I I II II I : I I 

orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf 10-1 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I : M I I I I I I I I I If I I I I ! M II : I I I I II I I I II I M I I I I I I I I II I I I I 
orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFFLX 
420 430 440 450 460 470 



45 



Homoloev with a predicted ORF from N \ gonorrhoeae 

ORF10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 
gonorrhoeae: 



50 



55 



60 



65 



orf 10ng .pep 
orf lOnm 
orf lOng . pep 
orf lOnm 
orf lOng .pep 
orf lOnm 
orf lOng . pep 
orf lOnm 
orf lOng . pep 
orf lOnm 
orf lOng . pep 
orf lOnm 



MDTKEILGYAAGSIGSAVLAVI ILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 
I I II II I I I I I I I I 1 I I I I I I M I I I I I I I I I I I I I I M I I I 1 I I I I M I I I I I M I I I 
MDTKEILXYAAGSIGSAVLAVI ILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 

YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 
I I I I I I I : I I I I I I I 11 I I I I I II : I I I 11 I II I I I I I I I I I II M I I I I I I I I I I I I I 
YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 
I 1 I I I I 1 I 1 I I I M I I I I I I M I I I M I I I I I I I I I I I I I I I I i I I I II : I I I I I I I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 

NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 24 0 

I I I I I M I I I I I I I I I M I i : II I I I I I I M I I I I I I : I M I : I II I It I I I I I I M I I 
NLAAAAFLLFQNRCRLKAVRHAPFS PAVLHRGXRYG I PIALSS IAYWGLASADRLFLKKY 24 0 

AGLEQLGVYSMGI SFGGAALLLQS I FSTVWTPYI FRAI EENAT PARLS ATAESAAALLAS 300 

II I I I M I I I I I I I II I I I I I : I I I II I I I I I M I I I I I I I I I I M I I I I I I II I I I I I 
AGLEQLGVYSMGI SFGGAALLFQS I FSTVWTPYI FRAI EE NAP PARLS ATAESAAALLAS 300 

ALCLTG I FS PLAS LLLPEN YAAVRFT WSCMLP PLFYTLTE I SG I GLNVVRKTRPI ALAT 360 

III I M I I I I I I I i I I I I | M | M | | | I I I I I I I I : 1 I I I I I I I I I 1 II M I I I M 
ALCXTG I FS PLASLLLPEN YAAVRFI WS CMX P PLFCTLAE I S G I GLNWRKTRP I ALAT 3 60 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 



-241- 



PCT/IB98/01665 



orf lOng .pep 
orf lOnm 



370 380 390 400 410 

LGALAANLLLLGL — AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
|| | | M I I I I I I I ill: II I I I I I I M I I I I : I I M I I M I I I I I I I 1 I I I : I I 

LGALAANLLLLGLDRAVPAR- PXGAAVACAAS FWLFFAFKTES SCRLWQPLKRLPLYLHT 

370 380 390 400 410 



10 



420 430 440 450 4 60 470 

orf lOng . pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 
| | I I : M I I I 1 I 11 I I 11 i I I II I I I I I I I I I I M M I I : I 1 I I I I I i I i I I I I I I 
orf lOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



15 



20 



25 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
, 851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



ATGGACACAA 
GGTTTTAGCC 
ACGACATCGG 
TCGGTATTGT 
CGCCGCCGAC 
TGTTTTCCGC 
TCTGAAATCC 
GCTGTTTGAA 
GTATGGAAGG 
CTCGCCATTC 
GGCGAACACC 
CCGCCGCCTT 
CGCGCGCCGT 
ACCGCTCGCA 
GTTTGTTCCT 
ATGGGTATTT 
AACGGTCTGG 
CCGCCCGCCT 
GCCCTCTGCC 
GGAAAACTAC 
cgctGTTTTA 
CGCAAAACGC 
CCTGCTGCTG 
CGGTTGCCTG 
AGCTCCTGCC 
CACATTGTTC 
CGGCAAACTA 
TGCATCCTGC 
AAAACAAGGT 



AAGAAATCCT 
GTCATCATCC 
GCGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 
GCGCGCCCTT 
TGCTGCTGTT 
TCCGTCCTGA 
TTTGCTGTTT 
TTTCGCCCGC 
CTGAGCAGCC 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGAAT 
GCCGCCGTCC 
CACGCTGACC 
GTCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTgGCCT 
CCCcctgttt 
GCCACCGGAA 
TTCCCATTAT 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 
GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAC 
TTGCCTATTG 
GCGGGCCTGG 
GGCGGCATTA 
TTTTCCGTGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTACCGT 
GAAATCAGCG 
GCTTGCCACC 
CCGTACCGTC 
TTCTGGTTGT 
GCCGCTCAAA 
CCTCGGCGGC 
gccggcgtAT 
AAATTTGCAC 
GA 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 
CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGGGGGCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGCTCCAAA 
AATCGAAGAA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCACG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGCATA 
AAACTGTTTC 



TCGGCAGCGC 
TTCcccgCCG 
ACTGACGGTG 
AATACTATGC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
CGTGCCCAAA 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCACGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGccgc 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACAGAA 
TTTATATGCA 
TTCGGCACAC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 380>: 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV 



SVLCLGLDQA 
SEILFSLDDA 
IAIL LLLPLT 
RAPFSPAVLH 
MGISFGGAAL 
ALCLTGI FSP 
RKTRPIALAT 



YVREYYAAAD 
AAGIGLVLFE 



KDTLFKTLFL 
LSFLPIRFLL 



PPLLFSAAIA 



LMQTAAGLTV 
ALLLSRPSLP 



VGLLHFPANT 
RGLRYGIPLA 
LLQSIFSTVW 
LASLLLPENY 
LGALAANLLL 



SVLTAVYALA 
LSSLAYWGLA 
TPYIFRAIEE 
AAVRFTWSC 
LGLAVPSGGT 



SSCRLWQPLK 
CILRHRKNLH 



RLPLYMHTLF 
KLFHYLKKQG 



CLASSAAYTC 
FPL* 



LVLRMEGRAL 

NLAAAAFLLF 
SADRLFLKKY 
NAT PARLSAT 
MLPPLFYTLT 
RGAAVACAAS 
FGTPANYPLF 



AFSSAQLVPK 

QNRCRLKAVR 
AGLEQLGVYS 
AESAAALLAS 
EISGIGLNW 
FWLFFVFKTE 
AGVWAAYLAG 



ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 



55 



60 



65 



10 20 30 40 50 60 

orf 10-1 . pep MDTKE I LGYAAGS IGS AVLAVI I LPLLSWYFPADDIGR I VLMQTAAGLT VSVLCLGLDQA 
II 11 I 11 I 1 I I I M I M ! M I I I I 1 M 1 I I I I I I I I I I II M I I I I I I I It I I I I I I I I I 
orflOng-1 MDTKEILGYAAGSIGSAVIAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf 10-1 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I i M II I : 1 I M II I I I I I I I I I I : I I I I I M I M I I I I M M I II i I I I II I I I II I I 
orf 10ng-l ' YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 
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10 



15 



20 



25 



30 



35 



130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
I i I I I I I I I I I I I I I I i I I I t ! I I I i M I I II I 11 I II 1 I I M I I I II I I : I I I I I ! M I 
orf 1 Ong- 1 lsflpirflllvlrmegralafssaqlvpklaillllpltvgllhfpantsvltavyala 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1 . pep nlaaaafllfqnrcrlkavrhapfspavlhrglrygipialssiaywguvsadrlflkky 
I I I I I I I M I II I I I II I I I : I II I I I I I I I I I I I I I I : I I I I : I I II I I I I I I I I I I I I 
orfl0ng-l NIAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPIALSSLAYWGLASADRLFLKKY 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 .pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I I I I I I I I I I M I I : II I I I I I I II I I I I I ! I I I I I I I M I I II I I I I I I I I 
orf 1 Ong- 1 AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1 . pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNWRKTRPIALAT 
I I II I I I II I I I I I I II II II II I I I I I I I I I I I 1 I I : I I I I I I I I I I II I I I I I I I I 
orf 10ng-l ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 10-1 . pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
I I II I I I II M I I I I M I I : I I I I I I I II I I I I M : I I I I II I I I I I I I I I I I II : I I I I 
orf 10na-l LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 

370 380 390 400 410 420 

430 440 450 4 60 470 

orf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I : I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I : I I I I I I I I I M I I II I 
orf 1 Ong- 1 CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
40 transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 45 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 381 >: 



45 



50 



55 



i . 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATCCTGAAAC 
CG AT CAAAAC 
CGGATGCGGA 
GAGGTTGAAG 
GGCAGTGCGT 
AAAAAGCGCA 
CCGTCTAAAG 
GAAGGAAAAA 
GCAgCATCGA 
AA.AACGTCC 
TATGCCGACC 
GGGCATATCT. 
ACCGGGTGCA 



CGCATAACCA 
GCCTTGTCCG 
AAATGCTGCC 
AAAAGGCGGG 
AAGAAAGCGC 
GAAGAAAGAT 
AAACAGAGAA 
GTTGCACCCA 
AAAmGCGCGC 
GACAAGGCGG 
GTCAGAGCGC 
TCCAAGGTGG 
AAGCGGCAAT 



GCTTAAGGAA 
AACCGGATGC 
GACAAGCAGC 
CGAGCCGGAA 
TGACGGAAGA 
GCCGAAACGG 
AAAAGCTTCA 
AACCAACCCC 
AgTGCCGCCG 
AAGC AACGC 
GGAAGGGCAG 
TCGGTTATCA 
ATGTCTGCCG 



GACATCCAAC 
TGCGACAGAG 
CCGTTGCCGA 
CGGGAAGAGC 
GCGTGAACAA 
T T AAAAT AC A 
AAAGAAGAGA 
GGAACAAATC 
CCAAAGAAGT 
ATTATCTGCA 
CGTGCCAAAC 
GGCGGGACAT 
ATGCGGTGA 



CTGATCCGGC 
GCAGAGCAGT 
TAAAGCCGAC 
CGGACGGACA 
ACCGTCAGGG 
AGCGGTAAAA 
AAAAGGCGGC 
CTCAACAGCG 
GCAGAAAATG 
AATGGGCGCG 
TGGCAATCTT 
AAAACGCTTT 



60 



This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 

1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 



BNSDOCID: <WO 9924578A2_I_> 



5 

10 
15 
20 

25 
30 

35 
40 
45 
50 
55 

BNSDOCID: <WO. 



WO 99/24578 



-243- 
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101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTATGA 
CTTCGGTTTG 
TGAACCAGAG 
CCTGCAGAAA 
CCAACCTGAA 
CAGAGGCAGA 
GCCGATAAAG 
AGAGCCGGAC 
AACAAACCGT 
AAACAAGCGG 
AGAGAAAAAG 
AAATCCTCAA 
GAAGTGCAGA 
GCAAATGGGC 
AACTGGCAAT 
CATAAAACGC 
GAAAAAAATG 
GTTCTATCGA 



ACAAATTTTC 
ATACTGGCGA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
CCGACGAGGT 
GGACAGGCAG 
CAGGGAAAAA 
TAAAACCGTC 
GCGGCGAAGG 
CAGCGGCAGC 
AAATGAAAAC 
GCGTATGCCG 
CTTGGGCATA 
TTTACCGGGT 
CAGGACGAGT 
AAGCAAATAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACCGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAAG 
TGCGTAAGAA 
GCGCAGAAGA 
TAAAGAAACA 
AAAAAGTTGC 
ATCGAAAAAG 
GTCCGACAAG 
ACCG T C AG AG 
TCTTCCAAGG 
GCAAAGCGGC 
TGAAAAAACA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTTC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GCGGGCGAGC 
AGCGCTGACG 
AAGATGCCGA 
GAGAAAAAAG 
ACCCAAACCA 
CGCGCAGTGC 
GCGGAAGCAA 
CGCGGAAGGG 
TGGTCGGTTA 
AATATGTCTG 
TGAAGTCGCC 



CCGGTTTTTT 
TTGTTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGCTGCGA 
GCAGCCCGTT 
CGGAACGGGA 
GAAGAGCGTG 
AACGGTTAAA 
CTTCAAAAGA 
ACCCCGGAAC 
CGCCGCCAAA 
CGCATTATCT 
CAGCGTGCCA 
TCAGGCGGGA 
CCGATGCGGT 
AGCCTGATCC 



This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

• 51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of N. 
meningitidis: 

10 . 20 30 

or f 65. pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

I I I I : I I II I II I : I I I I I I I I I I I I I I 
orf 65a 1 1 AGI LF YLNQSGQNAFKI PVPSKQPAETE ILKPKNQPKEDIQPEPADQNALSEPDAAKE 

30 40 50 60 70 80 



40 50 60 70 80 90 

orf 65. pep AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
I I I I I I I : I I I M I I I I I I I I I I I I I Mill: I I I I I I I I I I M I I I M I I I 1 I I I I 
orf 65a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 

100 110 120 130 140 150 

orf 65. pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
I ! II I II I I I I I I I I I I I II I 1 1 I I I I M I I I I I I I I I I I I II I I I I I 1 I I I II I I I 
orf 65a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 



160 170 180 190 200 210 

orf 65. pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KT PDKAEATH YLQMGAYADRRS AEGQRAKLAI LG I S S KWG YQAGHKT LYRVQSGNMSAD 

210 • 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 



9924578A2J_> 



0 



WO 99/24578 



PCT/IB98/01665 
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201 
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351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



TGAACCAGAG 
CCTGCAGAAA 
CCAACCTGAA 
AAGAGGCAGA 
GCCGACAAAG 
AAAGTCGGAC 
AACAAACCGT 
AAACAAGCGG 
AGAGAAAAAG 
AAATCCTCAA 
GAAGTGCAGA 
GCAAATGGGC 
AACTGGCAAT 
CATAAAACGC 
GAAAAAAATG 
GTTCTATCGA 



CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
CCGACGAGGT 
GGACAGGCAG 
CGGGGAAAAA 
TAAAACCATC 
GCGGAGAAGG 
CAGCGGCAGC 
AAATGAAAAC 
GCGTATGCCG 
CTTGGGCATA 
TTTACCGGGT 
CAGGACGAGT 
AAGCAAATAA 



GCGTTCAAAA 
GAAACCGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAGGAAAAG 
TGCGCAAGAA 
GCGCAGAAGA 
TAAAGAAACA 
AAAAAGTTGC 
ATCGAAAAAG 
GCCCGACAAG 
ACCGCCGGAG 
TCTTCCAAGG 
GCAAAGCGGC 
TGAAAAAACA 



TCCCGGTTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GCGGACGAGC 
AGCACTGACG 
AAGATGCCGA 
GAGAAAAAAG 
ACCCAAACCG 
CGCGCAGTGC 
GCGGAAGCAA 
CGCGGAAGGG 
TGGTCGGTTA 
AATATGTCTG 
TGAAGTCGCC 



GTCGAAGCAG 
AGGAAGACAT 
GATGCTGCGA 
GCAGCCCGTT 
CGGAGCGGGA 
GAAGAGCGTG 
AACGGTTAAA 
CTTCAAAAGA 
ACCCCGGAAC 
CGCTGCCAAA 
CGCATTATCT 
CAGCGTGCCA 
TCAGGCGGGA 
CCGATGCGGT 
AGCCTGATCC 



This encodes a protein having amino acid sequence <SEQ ID 386>: 



20 



i 

51 
101 
151 
201 
251 



MFMNKFSQSG 
PAETEILKPK 
ADKADEVEEK 
KQAVKPSKET 
EVQKMKTPDK 
HKTLYRVQSG 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
ADE PEREKSD 
EKKASKEEKK 
AEATHYLQMG 
NMSADAVKKM 



PADQNALSEP 
GQAVRKKALT 
AEKEKVAPKP 
AYADRRSAEG 
QDELKKHEVA 



LFYLNQSGQN 
DAAKEAEQSD 
EEREQTVGEK 
TPEQILNSGS 
QRAKLAILGI 
SLIRSIESK* 



AFKIPVPSKQ 
AEKAADKQPV 
AQKKDAETVK 
IEKARSAAAK 
SSKWGYQAG 



ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 
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55 



10 20 30 40 50 60 

orf 65a. Dep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQ PAETEILKPK 
I I I 11 ! I I I II I II I I I I \ I ! I I I I I I I I I I M M I I M I I I M I : I I I II I I I 1 I I I I 
orf 65-1 MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 65a . oep NQPKED I QPEPADQNALSEPDAAKEAEQS DAE KAADKQPVADKADEVEEKADE PEREKSD 
t I I II II I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I Mill: I 
orf 65-1 NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 
I I I I I I I I I I I I I I M I M I I I II I I I I I I I t It I I I I I I I I I I I I I II I I I I I I I I I 
orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 65a . pep TPEQILNSGS IEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I : I I I I I I I M I I I I I 
orf 65-1 TPEQILNSGS I EKARSAAAKEVQKMKTSDKAEATH YLQMGAYADRQSAEGQRAK LAI LG I 

190 200 210 220 230 240 

250 260 270 280 290 

orf 65a. pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

I I I I I I I I I I M I I I I I I I I I I II I I I M I I I i I I I I I I I I I I I I I I i I I 

orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 

250 260 270 280 290 

Homology with a predicted ORF from N '.gonorrhoeae , 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from N. 
gonorrhoeae: 



30 40 50 60 70 80 

ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 
60 I I I : I I | I | I I I : | | | | | M I I II : I I 

ORF65 I LKPHNQLKE D I QPDPADQN ALSE PDAATE 

■ 10 20 30 
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90 100 110 120 130 140 

ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

M I I I M : I I I I I I I I I I I I I I M I I I I II I M I I II I M M H I M I I I I I I I M I I M 
ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
5 40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

Mill : | | M I M I I I I I I I I I I I I I I i I I I I I I 11 I I M I I Mi MINIMUM 
10 ORF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 

100 110 120 130 140 150 

210 220 230 240 250 260 

ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 
15 | | | | | || | | II || M : I I I I I II I I I M I II I : I I : I I M I I M I : M II II I 

ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 



20 



ORF65ng MR 
I I 

ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
acid sequence <SEQ ID 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

25 51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARM PN PG ARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

30 After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTCTT 

51 CTTCGGTTTG ATACTGGCAA CGGTCATTAT TGCCGGTATT TTGCTTTATC 

101 TGAACCAGGG CGGTCAAAAT GCGTTCAAAA TCCCGGCTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACTGAAA AACCAGCCTA AGGAAGACAT 

35 201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGTTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAag ccgacgAGGT TGAAGAAAag GcGGgcgAgc cggaACGGga 

351 aGAGCCGGAC ggACAGGCAG TGCGCAAGAA AGCACTGAcg gAAGAgcGTG 

401 AACAAACcgt cagggAAAAA GCGCagaaga AAGATGCCGA AACGgTTAAA 

40 4 51 AAacaaGCgg tAaaaccgtc t AAAG AAAC a gagaaaaaag cTtcaaaaga 

501 agagaaaaag gcggcgaaag aaaAAGttgc acccaaaccg accccggaaC 

551 aaatcctcaa cagccgCagc atcgaaaaag cgcgtagtgc cgctgccaaa 

601 gaAgtgcaGA AAatgaaaaa ctTtgggcaa ggcgGaagcc aacgcattaT 

651 CTGcaaatgg gcgcgtatgc cgaccgtccg gagcgcggaA gggcagcgtg 

45 701 ccaaACtggc aAtctrgGgc atatctTccg aagtggtcgG CTATCAGGCG 

751 GGACATAAAA CGCTTTACCG CGTGCAAagc GGCAatatgt ccgccgatgc 

8 01 gGTGAAAAAA ATGCAGGACG AGTTGAAAAA GCATGGGGtt gcCAGCCTGA 

8 51 TCCGTGcgAT TGAAGGCAAA TAA 

This encodes the following amino acid sequence <SEQ ID 390>: 

50 1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

55 2 51 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 

10 20 30 40 50 60 

orf 65-1 . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 
I M I II I I I I II I I I I I I i I I 11 I M I I I I I M I I I : M I I I I M I I II I M II M I I 
60 orf 65ng-l MFMNKFSQSGKGLSGFFFGLILATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 65-1 . oep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
I I I | I | 1 I I I I I I I II I I I I I : I I I I I I I I I I II I M I I I II I I I II I I I I I II I M II 
orf 65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 65- 1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
I I I I I M M I I I I I I I I I I I M II I I I I I If ! I I I I I II I I I 1 I I I I I I I II I I I I I I I I 
orf 65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 65-1 . pep T PEQ I LN SGS I EKARS AAAKE VQKMKT S DKAEATHYL-QMGAYADRQS AEGQRAKLAILG 
I I I I I I I I I II I I I I I I I I I II I I I : :::::::: : I M I I I I I I 1 I I I 
orf 65ng-l T PE Q I LN S RSI EKARS AAAKE VQKMKNFGQGGSQR 1 1 CKWARMPTVRS AEGQRAKLAILG 

190 200 210 220 230 240 

240 250 260 270 280 290 

orf 65-1 . pep I SSKVVGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRS IESKX 
I I I : I I I I I I I I I I I I I I I I I I I M I I I 1 I I I I I I I I I I I II I I : II : I I 
orf65ng-l ISS E WG YQAGHKTL YRVQSGNMSADAVKKMQDE LKKHG VAS L IRAIEGKX 

250 260 270 280 290 



On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 46 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ED 
391>: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

2 51 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAAT CG AG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

4 01 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAAT CATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ED 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHl' NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

2 01 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAAT CG AG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG 

4 01 CCGCCTGCCT 

4 51 GTTTACAGCG 

501 CGGGTTATAT 

551 CAATCGGCAT 

601 ATCCGCCTGT 

651 TGCCGTCCTG 



AACCCGATAC 
TGCGGTCGGA 
CGTCGCTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACGGGATT 
TGGCTGTAA 



TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGAAAA 
ATCCGTATCA 



GTTACCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAAT CAT G C A 
TTATGGGCAT 



AAATCCATAC 
GTGCGGACTG 
CGGCAACGGG 
AATCTTTTAG 
AAACCGATAT 
TATGGAAACT 



10 



15 



This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVS LDQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis ( strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF 103a) from strain A of N. 



meningitidis: 
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25 



30 



35 



40 



orf 103 .pep 
orf 103a 



orf 103 . pep 
orf 103a 



orf 103 . pep 
orf 103a 



orf 103 . pep 



orf!03a 



10 20 30 40 50 60 

MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 

I { | i | M I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I M II I I I I I 
MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

Mi | I | t I! I 1 I I i I I M I 11 I I I ! I I I I I M I II M II I I M I I I I I I I I I M 

GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

NPTLNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
I M I I | I II I I I I I I M I I I I I I I I I M I I M I I I I I I I I I I I I I I M I I I I I I I I M I I 
NP T LNRLLPIKSI PACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

NLLAIGI FSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
M I I I M M II M I 1 I I M i I I I I I M I I I I 1 I I M I I I I I I 
NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 



The complete length ORF103a nucleotide sequence <SEQ ID 395> is: 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCANG 
CGGAACGCAC 
TCCAACTCCC 
ACAGGACGGG 
CGGACAGGTC 
TATACACGGC 
GGTATTTCTT 
GCGGAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGGTTATAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCGTTTCAC 
CGCCAACCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCGGTCGGA 
CGTCGCTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACGGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTNT 
TACGGCAATC 
TCGACCAAAC 
CTGCTGCTCT 
AAAAAT C GAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGNAAA 
ATCCGTATCA 



TTCCTACTCG 
ATTAAGCAGC 
GGCTGATCCT 
GGCCTGATAC 
CCGCGTCNTG 
TTTTAGGCTT 
AAAATCGGCA 
GTTACCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAGAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTA 
CGGCAACGGG 
AATCTTTNGG 
AAACCGATAT 
TATGGAAACT 



This encodes a protein having amino acid sequence <SEQ ID 396>: 

1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 
51 TGRVSSY TAI GLILGLIGQV GVSL DOTRVX QNILYTAAN L LLLFLGLYLS 
101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 
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10 



15 



20 



25 



151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLXAIGIF SL QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

orflC3a.pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
II | I | I M I I I I I I I I I I I I I I I I I I I I I I I I I I M ! I I I I I I I I I I I I I I I I I I I I I 
orf 103-1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 103a. pep GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I I I I I I ! I I I I I I I I I I I ! I I I I I I I I! I I I I 1 I I I I I I II I I I II II I II II I f M I 
orf 103-1 GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103a. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I I I M I I I I I I I I I II II I II I I I I I I I I 1 I I I I I I II I II I I I I I II II I I I I I I I I I I 
orf 103-1 NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103a . pep NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II I II I I I I I I I I I I II I M I II I I I I I I I I I I I I M I I I I 
orf 103-1 NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 



30 



35 



40 



45 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 
gonorrhoeae: 

orf 103 . oep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI -60 

I I I II I M II M I I I I I I I I II I I I II I I II I I I I II I I I I I I I I I I I I I : I II I I I 
orf 10 3ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

I 1 : I I I I I I : I : I I I I I I I I I I I I I I I : 11 I I I I I M I I I I I I I I I I I I I I I I I I 1 M I I 

orf 103ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orf 103 . pep N PI LNRLLP I KS I PACLAVG I LWGWL PCGLVYSASLYALGSGS AATGGLYMLAFALGTLP 180 

II I II I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I II I I I I : I I I I I I 11 I I I I I II 

orf 103ng NPILNRLLPIKS I PACLAVG I LWGWL PCGLVYSASLYALGSGS ATTGGLYMLAFALGTLP 180 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I 11 II I I I I II II 
orf 103ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCACG 
CGGAACTCAC 
TCCAACTCCC 
ACAGGACGGA 
CGGACAACTC 
tatacacagc 
GGTATTTCTT 
GCGCAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGACTGTAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCATTTCAC 
ctccaaCCTC 
CCTTGGCGGC 
AAC C C GAT AC 
TGCTGTCGGA 
CATCACTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACAGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTTT 
TACGGCAATC 
TCGACCAAAc 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGAAAA 
ATCCGTATCA 



TTCCTGCTCG 
ATTAAGCAGC 
GGCTGATTCT 
GGCCTGATGC 
ccgcgTCCTG 
TTTTAGGCTT 
AAAATCGGCA 
GCTGCCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAAT CAT G C A 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAAAATATTT 
ATACTTGAGC 
AACCGATATG 
AAAT C CAT AC 
GTGCGGACTG 
CGACAACCGG 
AATCTTTTGG 
AAACCGATAT 
TATGGAAGCT 



This encodes a protein having amino acid sequence <SEQ ID 398>: 
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1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL ONILYTASN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 



10 



15 



20 



25 



30 



orf 103-1 .pep 
orf 103ng 



orf 103-1 .pep 
orf 103ng 



orf 103-1 . pep 
orf 103ng 

orf 103-1 .pep 
orf 103nc 



10 20 30 40 50 60 

MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

I | | | M I II I I I I M I I I I M I M I I I I I I I I I I It I I I M I I I I I I I I I I I I : I I I I I I 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

| | : | | | M | : | : 1 I I I I I II I II I I I I : M I II II I I I I M I I 1 I I I I I I 1 I I I I M ! I I 
GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
| | | | | M I I I I I | I I I I I I I I I I I I I I I I II I II I I II I II I I I : I I I I I I I I I I I I I I I 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I M M I I I I I I M I I I I I I I I I I I i I I II I I I I I I I I 1 M I I I 
NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 

35 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 



40 



45 



50 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTTCTTGGTG 
GCAAACTTTG 
GACGCAGGTT 
TGTTGGTGTT 
TTGCTGCTTG 
GTCGGGTTTG 
GTATGGCATG 
TTCGGGCCGC 
GTTCCTGCCG 
TGGCGTGGGT 
GGCTCGTTCG 
GGTAACAACC 
AT TAT G T GAT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCACT 
CTCATTCAGG 
TGCTGATTGC 
TTGTGGCAGA 
TAAAGACCGG 
CCGGTTTGCT 
GGCGCGTATG 
GGTGTGTAAT 
AACAGATTCT 
TTTGCCGAAC 
ATGTATTGCG 
GCGAGGCGTT 
TTGCTCCCCG 
GCCTGAAACT 



CCTAGGCTTT 
CGAT . TCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
CTGCTGCTGC 
CCAAGGGCTG 
TTTCGCCGTT 
ATGACTGCCG 
TATGTATTTT 
C.AAGGGCGT 
GCCGTGGCGC 
GCTGTTGATT 
CGGCACACAT 
TATTGCTGCT 
GAAACATTGG 
TGTTTACCGT 
TTTGCCGCGC 



CGCTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCcGAAGC 
TCGGCGTGGC 
CAT T AT ATT T 
TACGATGATT 
CTCAGAAAAT 
AACGATAAAT 
GTTGCTGTGT 
AAAAGCTGCT 
TATGCGGCAA 
CGGAAGTATG 
TGAATACGTT 
GAGGCTTCCA 
AATAAATACT 
CGGA. . 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGaGGATT 
GGGCATTTCG 
CGCCGACCAC 
GTwGTCGGTG 
CGGCTTGGTT 
TCGGCGAGTT 
GCGGCAGGCA 
GTCGGCGCAA 
GTGCCGCCGT 
GACGGTACGT 
AATCGGTTAC 
AAGTCAGCGC 
TTGCTCGGGC 



55 



This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-250- 



PCT/IB98/01665 



251 VTTLLPVFTV INTLLGHYVM PETFAAP... 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 

1 ATGGAAAA.CC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

4 01 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

4 51 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

7 01 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

7 51 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV W VR FT V AAA V 

51 LFVLL ALGGR LPKRRDFSWC SF RLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 5GLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA E PAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein ofH. influenzae (accession number U32769) 
ORF104 and HI0878 show 40% aa identity in 277aa overlap: 



orf 104 


4 


QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 


62 






Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 




HI0878 


3 


QQPLLGFT FALITAMAWGSLPIALKQVLSVMNAQTIVWYRFI IAAVSLLALLAYKKQLPE 


62 


orfl04 


63 


— KRRDFSWCSFRLLLLGVAGISANFVLIAQGLKYISPTTTQVLWQISPFTMIWGVLVF 


120 






K R ++W -«-+L+GV G+++NF+L + L+YI P+ 0+ +S F M++ GVL+F 




HI0878 


63 


LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 


118 


orfl04 


121 


KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 


180 






K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 




HI0878 


119 


KEKLGLKQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 


178 


orfi04 


181 


SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 


240 






+ F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 




HI0878 


179 


LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 


237 


orfl04 


241 


KHWEASKVSAVTTLLPVFTVINTLLGH YVMPET FAAP 277 








W+ SKVS V TL+P+FT^+ + + HY P FAAP 




HI0878 


238 


NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 27 4 





Homology with a predicted ORF from TV. meningitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A ofW. 
meningitidis: 

10 20 30 40 50 60 

orf 104 . pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
MINIMI! II I M M I II N I : II N II N I I I II II II I I I II I! M I I M II I I 
orf 104a MENQRPLLGFAXJU.LAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 HO 120 



WO 99/24578 



PCT/IB98/01665 



-251- 



10 



15 



orfl04 pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
M I I I I ( i I I I I I M I I I I M I I 1 M I i I I M I I I I M I I M i I I I 1 I I I i M I I I I M 
orf!04a LPKWRDFSWCSFRLLLLGVAGIS AN FVLIAQGLHYISPTTTQVLWQISPFTMI WGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 04 pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
M | I I M I I I I I I I I I 1 I I 1 I : II I I I M I I I I I II I I I I I I I I M I I I I I I I I I I I I 

orf 104a kdrmtaaqkiglvlllagllmffndkfgelsglgayakgvllcaagsmawvcyavaqkll 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 104 .pep saqfgpqqillliyaasaavflpfaepahigsmdgtlawvciaycclntligygsfgeal 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M : I I I I I II I : M I I M I II I I I M 1 I I I 
orf 104a saqfgpqqillliyaasaavflpfaelahigsldgtlawvcfaycclntligygsfgeal 

190 200 210 220 230 240 



20 



250 260 270 

orf 104 . pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 

I I I I I I 1 I I I I I I I II I I I I I : I I II I I II : I M I I - 
orf 104a KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 

250 260 270 280 290 300 

The complete length ORF 104a nucleotide sequence <SEQ ID 403> is: 
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30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGTGC 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCTTGC 
TCGGGTTTGG 
TATGGCATGG 
TCGGGCCGCA 
TTCCTGCCGT 
GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 
ATGCCGGCGC 
GACAGGCTGT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTTTGCTT 
GCGCGTATGC 
GTGTGTTATG 
AC AG AT T CT G 
TTGCCGAACT 
TGTTTTGCGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGATACTT 
ACTGGTCGTG 
TCAAACGCCG 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 
GTCGGGGGTG 
CTAG 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGT 
CGGCGTGGCG 
ATTATATTTC 
ACGATGATTG 
T C AG AAAAT C 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 
CGGTTACGGC 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGTGT 
GGCTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
TGCCGCCGTG 
ACGGTACGTT 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 
GGTTTGGGTT 
GGCGGTGGGG 



45 



50 



This encodes a protein having amino acid sequence <SEQ ID 404>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVR FT VAAAV 

51 LFVLL ALGGR LPKWRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFAELAHI GSLD GTIAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



55 



10 20 30 40 50 60 

orf 104a. oep MENQRP LLG FALAL LAAMTWGT LP I AVRQVLKFV D APT L VWVR FT VAAAVL FVL LALGGR 
M I II I I I M I II I I I I I I I I I 1 I I I I I II I I I I II I I II I I I II I I I I I I I I I I M I II 
orf 104-1 MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



60 



70 80 90 100 110 120 

orf 104 a. pep LPKWRDFSWC SFRLLLLGVAG I SAN FVL IAQGLHY I SPTTTQVLWQISPFTM I WGVLVF 
I I 1 I I I I I II I I I I I I II I I II I II I I I II I I II II I I I I II I I II I I I i I I I I I I I I I 
orf 104-1 LPKRRDFS WC S FRLLLLGVAG I SAN FVL I AQGLH YI S PTTTQVLWQI S P FTM I WGVLVF 

70 80 90 100 110 120 



65 



130 140 150 160 170 180 

orf 104a . pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-252- 
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10 



15 



I M I I I I I I I I I I I I I I I M It I I II I! I I I I I I I I I M II I! I I I II I II I I [ I I M M 
orfl04-l KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

or^l04a . pep SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
I I I I I I I I II I M I I I I I I II I I I I I I I I I I I I II I.I I I I I I I I I I I I I I I I I M I I I I 
orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 104a . pep KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 

I M M I I II I I I I I I I I I I I I It I I I I I I : I I I I I 
orfl04-l KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 

250 260 270 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 



20 



25 



30 



35 



orf 104 .pep 
orf 104ng 
orf 104 . pep 
orf 104ng 
orf 104 . pep 
orf 104ng 
orf 104 . pep 
orf 104ng 
orf 104 . pep 
orf 104ng 



MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 
I I 1 I I I I I I I I I I I I I I I I I I I I : I I I I 11 I I I I I I I I I I I M I I I I I I I M I M I I I 

MENQRPLLG FALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 
MINIMI I I M I I I I I : I I I I I I I II I I I 1 M I I I I ! I i II I I I M I I I I II I I I I I 

LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 
I I I I It I I II I I I I II : I I I I : I I I M I I 11 I I I I I I I I I II I I II I II M 11 I II I I 

KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 2 40 
I M II I II II II It N I I I I II I I I I I I I I : I I I M I II : : I I I I I I I I I I I I I I N I 

SAQFGPQQILLLIYAASAAVFLLXAE PAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
I II I I 11 11 I I I I I 1 1 I I I I I : I 1 II I I I I : II 1 II 

KHWEAS?CVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 



40 



45 



The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLLX AE PAH I GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGCAT 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCttgT 
TCGGGTTTGG 
TATGGCCTGG 
TCGGGCCGCA 
TTCCtgccgT 
GGCGTGGGTT 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 
TGTTTTGTGT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
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701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

7 51 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

! MENO RPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

10 1 T QVLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAOFGPQQ IL LLIYAASAAV 

201 FLPFAE PAH I GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 

301 DRPFKRR* 

ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 

10 20 30 40 50 60 

MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I | | | | M | M I I I II I I I I M M I I I I t I I M I I I I I I I 1 i 1 I 1 I I I I M I I I I I I i 
MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
10 20 30 40 50 60 



orfl04-l.pep 
orf 104ng-l 



70 80 90 100 110 120 

orf 104-1 . pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
| | M I I I M I M II I I I I : I M I I I I I 11 I M I I M I I I I M I I I I I I I I I M I II I I I 
orfl04ng-l LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 104-1 pep KDRMTAAQKI GLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
" * | | | | | | M I t II I M 1 : M II II i M I I I M M I I I I I I I I I ! I I I I I I I I I I I M I I I I 
orf!04nq-l KDRMTAAQKI GLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAW VCYAVAQKLL 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 104-1 pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
' ^ | | | M I I I I I I I I I I I M I I M 11 I I I III I II I I I I I I 1 I I : I I I I II I I I I I M I I II 
orfl04ng-l SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 

190 200 210 220 230 240 



250 260 270 

orf 104-1 .Dep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
I | | | I I I I I I I II I I I I I I I I 11111111:11111 
orf 104na-l KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 

250 260 270 280 290 300 

In addition, ORF104ng~l shows significant homology with a hypothetical H. influenzae protein: 

gi 1 1573895 (U32769) hypothetical f Haemophilus influenzae] Length = 306 
Score = 237 bits (598), Expect = 8e-62 

Identities - 114/280 (40%), Positives - 168/280 (59%), Gaps - 8/280 (2%) 

Query: 30 QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 88 

Q+p M WG+LPIA++QVL ++A T+VW P 

Sbjct: 3 QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

Query: 8 9 — KRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 

Sbjct: 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

Query: 147 KDRMTAAQKI XXXXXXXXXXMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 206 

K+++ QKI +FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 

Sbjct: 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 178 

Query: 207 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAVA^CFVYCCLNTLIGYGSFGEAL 2 66 

+ F QQILL++Y A F+P A+ + + L LA +CF+YCCLNTLIGYGS+ EAL 
Sbjct: 179 LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 



Query: 267 KHWE ASKV S AVTT LLPV FT V IFSLLGHYVM PDTFAAPDMN 306 
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W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
Sbjct: 238 NRWDVSKVSVVITLVPLFTILFSHIAHYFS PADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
5 N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

10 51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T . TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AG C C AAC CAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

15 301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

4 01 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

4 51- TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

20 551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

7 01 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

7 51 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 
25 801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

8 51 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 
901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG . . . 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 

1 MVARRAHNPK VVGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

30 51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPTV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH . SLRSVSRGVH 

35 301 NEILYVFDAV LP... 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

40 151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

2 01 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

2 51 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

45 4 01 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

50 651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

7 01 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

7 51 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT T TAT AG 

55 This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 WPrVRFTESV 

51 ERVKKDWEAG 

101 ECFDLTDGGG 

151 SPHKAVDPNK 

201 SQLHSLRSVS 

251 DAMLSGNMMH 



SKQDLDALFE 
CSESSDGIFL 
NPLFTLERAA 
LDNTAAGGVS 
RGVHNEILYV 
DAQLVTLDAF 



WAKASYGAES 
NADGWPDMGG 
FRPFGLLSRA 
GGEMPSEAVC 
FDAVLPETFL 
CRYGLIDAAH 



CWKTLYLNGL 
RLQHLALGWH 
VHLNGLTESD 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWV 
CAGLLDGWRN 
GRWHFWIGRR 
KTLLPLIRPV 
FEKMDIGGLL 
L* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis f strain A) 

ORP105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A of TV. 



10 meningitidis: 



15 



20 



25 



30 



35 



40 The 



45 



50 



55 



orf 105. pep 
orf 105a 



orf 105 .pep 
orf 105a 



orf 105 . pep 
orf 105a 



orf 105 .pep 



orf 105a 



orf 105 . pep 



orfl05a 



60 70 80 90 100 110 

ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

I I I I 1 I ! II M I : I I It I I I I M i ! I M I I 
MPTVRFTESVSKHDLDALFEWAKASYGAES 
10 20 30 

120 130 140 150 160 170 

CWKT L Y LNGX PLGNL S PE WVERVXKDWEAGCXES S DG I FLNADGW PDMGGRLQHLALGWH 

I ! I I i 1 I I i I I I I I I I I I : II I I I I I I M I I I I I I I I I I I I I I I I I MINI I : 
CWKT LYLNGLPLGNLSPEWAERVKKDWEAGCSESSDG I FLNADGW PDMGRRLQHLARIWK 
40 50 60 70 80 90 

160 190 200 210 220 230 

CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 

M I I I I I : I I I M I I I I : It M : I I I I I I I I I I I I I II I M : I M II I I I I II II 
EAG1LHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 

240 250 260 270 280 290 

SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 

I 1 I II i I I : I I I I I I I I I I : I I : I I I : I I I I M I I I I I I I M I I I I I I I I II II I I I I 
SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 180 190 200 210 

300 310 
RGVHNEILYV FDAVLP 
I I I I I I II I M I I I I I 

RGVHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 
220 230 240 250 260 2*70 



ZZu ZJU <£4U ^ou 

complete length ORF105a nucleotide sequence <SEQ ED 413> is: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601. 
651- 
701 
751 
801 
851 



ATGCCGACCG 
CCTATTCGAG 
CGCTGTATCT 
GAGCGCGTCA 
CATTTTCCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCAGCTGC 
CCTGTATGTA 
AGGATGGCGA 
GCTGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
TGGGCAAAGG 
GAACGGTCTG 
AAAAAGACTG 
AATGCGGACG 
AAT AT G G AAA 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGGAC 
AAGCAGTCGA 
AGCGGTGAAT 
CGGTTTGGAT 
ACAGCCTGCG 
TTCGATGCCG 
AGTGGCGGGT 
TGTCGGGAAA 
TGCCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCAGA 
GAAGCGGGAC 
CGGCGGCAGC 
TCGGACTGCT 
GGCCGATGGC 
TCCCGACAAA 
TGCCGTCTGA 
AAAACGCTGC 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAGAAAA 
CAT GAT GC AC 
GTCTGATTGA 
TTATAG 



AGCAAACACG 
TGCGGAAAGT 
ATCTGTCGCC 
TGCTCGGAGT 
TATGGGCAGA 
TGCTTCACGG 
AATCCCTTGT 
CAGCCGCGCC 
ATTTCTGGAT 
CTCGACAATA 
AACCGTGTGT 
TTCCGCTCAT 
CGGGGTGTGC 
AACCTTCCTG 
TGGACATCGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCG 
CTTCAGACGG 
CGCTTGCAGC 
CTGGCGCGAC 
TCGCGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
CTGCCGCCGG 
CGCGAAAGCA 
CCGCCCGGTA 
ACAATGAAAT 
CCTGAAAATC 
CGGTCTGTTG 
TGGTTACGCT 
CCGCTGTCCG 



60 



This encodes a protein having amino acid sequence <SEQ ID 414>: 

1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 
51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

5 ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf 105a . pep MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 
I I I I I I I I I I I I : 1 I I I I I I I I I M I I I I I I I 1 I I I I II I I II I i II I I : II I M I I I I I 
orf 105-1 MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105a . pep CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 
I I I I I I I I II II I I I I M I I I I I I I I : I I I I I I I : I M I I II I I : I I I I : I I I M 
orf 105-1 CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105a . pep FR PFGLLSRAVHLNGLVESDGRWHFW I GRRSPHKAVDPDKLDNTAAGGVS SGELPSETVC 

I I I I I I I I I I I I I I II : I II I I I I I I I I I I I I I I I I II : I II I I I I I I I I : I I : I I I : I I 
orf 105-1 FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 

. 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 105a . Dep RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I M I I I I I I I I I 
orf 105-1 RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 230 240 

250 260 270 280 290 

orf 105a . peo FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 105- 1 FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 290 
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Homology with a predicted ORF from N.zonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 
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45 



50 



55 



60 



orf 105 . pep 
orf 105ng 
orf 105 . pep 
orfl05ng 
orf 105 . pep 
orf 105ng 
orf 105 . pep 
orf 105ng 
orf 105 . pep 
orf 105ng 



MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 
I I II I I I I I I I I I i I I Ml : I I II I I I I I I I I M I I I I I I I I I M I I I I t I 

MVARRAHN PKWGSN PAPATKYQT PRFNAEGVLF FLFPAASVFCRIFLPAAISER 55 

QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 
I : II I II M I I I I M I I I I I I I I I : II I I I I M M I I I I I I I I I I I I II I I I I I I I I I 

QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 
I I I I 111111111:11: I I I I I I I I I I : I I I I I I I I I I I I I II I I I I I I : III 

LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNKAGL 17 5 

LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 24 0 
I I I I I I I I I I I I I I I I I I I I I I I I II III | | 1 | | | | | : I I : | I | | | | | | | | | | | | 

LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 
1111:1111 : I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I II 1 I I : I I I I I MINI 

AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

312 



NE I LYVFDAVLP 
I I I II I I II I I I 

NEILYVFDAVLPETFLPENQDGEVAG FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 



orf 105 . pep 
orf 105ng 

A complete length ORF105ng nucleotide sequence <SEQ ID 41 5> was predicted to encode a 
protein having amino acid sequence <SEQ ID 416>: 
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1 MVARRAHNPK 

51 AISERQAAVC 

101 ERAKASYGAE 

151 LNADGWPDMG 

201 AFRPFGLLIR 

251 SGGEMPSEAV 

301 VFDAVLPETF 

351 FYRYGLIDAA 



WGSNPAPAT 
LRLQIQAVWL 
SCWKTLYLNR 
GRLQHLARTW 
AVHLNGLVES 
CRESSEEAGL 
LPENQDGEVA 
HPLSEWLDGI 



KYQTPRFNAE 
QSSALCSRKP 
LPLGNLSPEW 
NKAGLLHGWR 
NGRWHFWIGR 
DKTLFPLIRP 
GFEKMDIGGL 
RL* 



GVLFFLFPAA SVFCRIFLPA 



AMPTVRFTES 
AERIKKDWEA 
NECFDLTDGG 
RSPHKAVDPG 
VSRLHSLRPV 
LDAMLSKNMM 



VSKQDLDALF 
GCSESSNGIF 
GNPLFTLERA 
KLDNIAGGGV 
SRGVHNEILY 
HDAQLVTLDA 



Further work revealed the complete nucleotide sequence <SEQ ID 41 7>: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGCCGACCG 
CCTGTTCGAG 
CGCTGTATCT 
GAGCGCATCA 
CATTTTTCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCGGCTGC 
CCTGTATGTG 
AGGATGGCGA 
GATGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
CGGGCAAAAG 
GAACCGTCTT 
AAAAAGACTG 
AATGCGGACG 
CACATGGAAC 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGAAC 
AAGCAGTCGa 
GGCGGCGAAA 
CGGTTTGGAT 
ACAGCCTTCG 
TTCGATGCCG 
GGTAGCGGGT 
TGTCGAAAAA 
TACCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCGGA 
AAGGCGGGGC 
CGGCGGCGGC 
TCGGACTACT 
GGCAGATGGC 
tcCCGGCAAG 
TGCCGTCTGA 
AAAACGCTGT 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAAAAGA 
CATGATGCAC 
GTCTGATTGA 
T TAT AG 



AGCAAACAAG 
TGCCGAAAGT 
ATCTGTCGCC 
TGCTCCGAGT 
TATGGGCGGA 
TGCTTCACGG 
AACCCCTTGT 
CAGCCGCGCC 
ATTTTTGGAT 
CTCGACAATA 
AGCCGTGTGC 
TTCCGCTCAT 
CGAGGTGTGC 
AACCTTCCTG 
TGGACATTGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCT 
CTTCAGACGG 
CGCTTGCAGC 
ATGGCGCAAC 
TCACGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
TTGCCGGCGG 
CGCGAAAGCA 
CCGCCCAGTA 
ACAATGAAAT 
CCTGAAAATC 
CGGCCTATTG 
TGGTTACGCT 
CCGCTGTCCG 



This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l 



>: 



30 



i 

51 
101 
151 
201 
251 



MPTVRFTESV 
ERIKKDWEAG 
ECFDLTDGGG 
SPHKAVDPGK 
SRLHSLRPVS 
DAMLSKNMMH 



SKQDLDALFE 
CSESSDGIFL 
NPLFTLERAA 
LDNIAGGGVS 
RGVHNEILYV 
DAQLVTLDAF 



RAKASYGAES 
NADGWPDMGG 
FRPFGLLSRA 
GGEMPSEAVC 
FDAVLPETFL 
YRYGLIDAAH 



CWKTLYLNRL 
RLQHLARTWN 
VHLNGLVESN 
RESSEEAGLD 
PENQDGEVAG 
PLSEWLDGIR 



PLGNLSPEWA 
KAGLLHGWRN 
GRWHFWIGRR 
KTLFPLIRPV 
FEKMDIGGLL 
L* 



35 ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 
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orf 105-1 .pep 
orf 105ng-l 

orf 105-1 .pep 
orf 105ng-l 



orf 105-1 . pep 
orf 105ng-l 

orf 105-1 . pep 
orf 105ng-l 

orf 105-1 . pep 
orf 105ng-l 



10 20 30 40 50 60 

MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 

I | I I I I I I I I I H I I I I I I I I I I I M I I 1 I I 1 I 1 M I I I M I II I I I : I I : M I I I I I 
MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
I i t I M I I 11 I I I I ! I 11 I M ! I I I I I : till I i ! i 1 I I t I I I I I I I I I I 1 M i M 
CS E S S DG I FLN ADGW P DMGGRLQK LART WNKAGLLHGWRNEC FDLTDGGGN PLFTLERAA 
70 80 90 100 110 120 

130 140 150 160 170 180 

FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 

I | | | | I | | I | I I I I I I : I I : I II I I I I I I i M I I I I I I : I I t I 1:11111111111111 
FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
M I I I I I I I M M : I M II I I : I I I ! I I I I I I I M I I I I I I II I I I I I M I I II I I II I 
RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 . 220 230 240 

250 260 270 280 290 

FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 

I 1 I I I I I I I I I ! I I 1 II I I I I I II II 1 I I I I M I I I I I ! I I I I I I I I I I I 
FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 
250 260 270 280 290 
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Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888 |TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi 1 1076928 Ipir II S52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) >gi 1 666111 (X84417) thiamin pyrophosphokinase 
5 [Schizosaccharomyces pombe] >gi I 2330852 | gnl I PID I e334056 (Z98533) thiamin 

pyrophosphokinase [Schizosaccharomyces pombe] Length «= 569 
Score 105 bits (259) , Expect = 4e-22 

Identities = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 

10 Query: 268 NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW — HFWI 441 

N G+ WRNE + + P+ +ER F FG LS VH + + W+ 

Sbjct : 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 155 

Query: 442 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 621 
15 " RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Query: 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 798 
R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 
20 Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVI PRINDGEVAGFSLLPLNQVLHELELKS FKPNCALVL 274 

Query: 7 99 LDAFYRYGLIDAAHP 843 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGIITPQHP 289 

25 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 49 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ED 
30 419>: 

1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

35 201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

2 51 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

4 01 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

40 4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC . CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

45 51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 

Homoloev with a predicted ORF from N. meningitidis (strain A) 
50 ORF107 shows 97.8% identity over a 1 86aa overlap with an ORF (ORF107a) from strain A of K 
meningitidis: 

10 20 30 40 50 60 
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10 



15 



orfl07 pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
I I I | I i I I | | | i M i M i I t M I I I I i I I I M I i I I I i M I M I I I I t I I I t I I I I I I I I 
orfl07a mnrpkqpffrpevavarqtsltgkviltrplsfslwttfasisalliilflifgnytrkt 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl07 pep tvegqilpasgvirvyapdtxtitakfvedgxkvkagdklfalstsrfgaggsvqqqlkt 

I I M | | | I I I 1 I I I I M I I I MINI III I I I I I I I I II I I I I I I I I 1 I I I I I M I 
or^l07a tvegqilpasgvirvyapdtgtitakfxedgekvkagdklfalstsrfgagdsvqqqlkt 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl07 pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
I | | | | M | || I I I I I II M I I I II II I I I I I I I I I M I M I II I I I I I I I I I I I I I I I I I 
orfl07a EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

130 140 150 160 1-70 180 



20 



189 

orfl07.pep KYRFLSXQX 
I II I 11 

or f 107 a KYRFLSANDAVPKQEMMNVKAELLEQKAKLDAYRREEVGLLQEIRTQNLTLXSLPQAAX 

190 200 210 220 230 

The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



25 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGAATAGAC 
CCAAACCAGC 
CCCTATGGAC 
TTGATATTTG 
ACCTGCATCG 
CNGCGAAATT 
TTTGCGCTTT 
GTTGAAAACG 
GTCGTCTGAA 
GTCGAACGTT 
TCAGAAAAGG 
TCCTATCCGC 
GCAGAGCTTT 
AGTCGGGCTG 
TCCCCCAAGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTAATCA 
CNTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
CAATGATGCA 
TAGAGCAGAA 
CTTCAGGAAA 
GGCATGA 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCCAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



40 



45 



This encodes a protein having amino acid sequence <SEQ ID 422>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LI FG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 1 88aa overlap with a predicted ORF (ORF107.ng) from N. 



50 



55 



60 



gonorrhoeae: 

orf 107 . pep 
orf 107ng 
orf 107 . pep 
orf 107ng 
orf 107 . pep 
orf 107ng 
orf 107 .pep 
orf 107ng 



MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
| | | I M M II I M i : I 11 I I I I I I I I I I I 1 I I M I I i I I I I I I i M I I I II i I I 1 I I I I I 
MNRPKQPFFRPEVAIARQTSLTGKVILTRPLSFSLWTTFAS I SALLI I LFLI FGNYTRKT 



60 



60 



120 



TVEGQ I LP ASGV I RV Y AP DTXT I TAK FVE DGXKVKAG DKLFALST S R FGAGG S VQQQLKT 
| : | | U M I M M I I I I I I I I I I I I I I I I I I M I I I M I I I I II I I I I I I I I I I I I I I 
TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 



EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

|| I II M I I I I I I M I 11 I I I II I I I I I I I I I M II : I I I I I I I I I I I I : 

EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 

KYRFLSXQ 188 
MIHI I 
KYRFLSAQ 188 



180 



180 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

5 101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

10 Example 50 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
425>: 

1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT - TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

15 101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

20 351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 

25 1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 

30 1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

35 251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

• 301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

40 501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 

1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

45 151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF fr om N gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORP108.ng) from N. 



gonorrhoeae: 

orf 108 .pep 



MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 
| j : | M | 1 || | | I || I II I I I I I I : I I I I I I II I I I II I I I I I I I MINI 

o r f 1 0 8 ng MLKI PFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYI DNTAI AGLALGQS S E 60 

orf 108 pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
|] II IMMIMM II M Ml:: M 1111:11111 MM1II 11:1:1111111111 

10 or f 108ng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orfl08 peo LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

M I I I M I I I I I I I I I I I I I : I I : I 1 1 I I It M II i I I I I M I I I I I I I I I I 11 M II M I 

orfl08ng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 



15 



20 



ORF108-1 shows 92.3% identity with ORF108ng over the same 181 aa overlap: 

MLKT S FAVLGGCLLLAACGKS ENTAEQPQNAV QS APKPVFKVKY I DNTAI AGLDLGQS SE 6 0 
Ml | | | | | | | I I I I I II II 11 M I I II I I : I I I I i I I I I M II I I I I 1 II I I I M II 
MLK I PFAVLGGCLLLAACGKS ENTAEQPQNAAQSAPK PVFKV KYI DNTAI AGLALGQS SE 60 

GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

I II M I I I i I II I I I I I I I I I :: I I IIIUIllll I I I I I I I 11 : I : I I I I I I M I I 
GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 



orf 108-1 


.pep 


orf 108ng- 


-1 


orfl06-l 


.pep 


orf 108ng- 


-1 


orfl06-l 


- P e P 


orf 108ng 


-1 



?5 or^l06-i D ep L ~AKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

* i m I I I I M I I I I I M I M I I : I I : M 11 M I I I 1 I I II I I I I I M I I i I I I M 1 I I I M I I 
olliuoiiu- a LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

30 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

35 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

40 This encodes a protein having amino acid sequence <SEQ ID 430>: 

1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYI DNTAI 

51 AGLAL GOSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

50 Example 51 

The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 
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1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

4 51 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA ASFVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGN DGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 ' 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
TAGGCGGCGT 
CTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCCTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATATATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGCCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGGCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGCGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGCGGGGG 
CCTCCCGTGT 
GTTTTCAGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTCGCACCGC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCAC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATTGC 
ACGCTGCCCG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF 109 shows 95.9% identity over a 147aa overlap with an ORF (ORF 1 09a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 109. pep MEDLYI I LALGLVAMIAGFI DAI AGGGGL I TLPALLLAGI PPVSAIATNKLQAAAATFS A 
I I I ] I I I II I I I M II I I I I I I I I I II I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I i I M 
orf 10 9a MEDLYI ILALGLVAMIAGFI DAI AGGGGLITLPALLLAGI PPVSAIATNKLQAAAATFSA 

10 20 30-40 50 60 



70 80 90 100 110 120 

orf 109 . pep TV S FARKG L I D WKKGL P I AAAS FVGGV AG AL S V S LV S KD I LLA W P VLL I FV AL Y FV FS P 
I I I I I I I II II I M I I II I I I II : I I I : I I I I I I I I I I M I I I I I I I I I I I I I I II I M I 
orf 10 9a TVSFARKGLIDWKKGLPI AAAS FAGGWGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 



WO 99/24578 



PCT/IB98/01665 



-263- 



130 140 150 160 170 180 

orfl09.pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

I I I I I I I I I I II I I I I M I I I ' M 
orf 109a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
CTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCCTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GG CAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGTCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGCGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGTGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCAC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCGGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATTGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



25 



30 



This encodes a protein having amino acid sequence <SEQ ID 436>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 T.QAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGVVGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF 109-1 show 99.2% identity in 262 aa overlap: 



35 



40 



10 20 30 40 50 60 

orf 10 9a pep medlyiilalglvamiagfidaiagggglitlpalllagippvsaiatnklqaaaatfsa 
M | | | | | U I I I I I 1 M I II I I M I I I I M M I I M I I I I I I I 1 I I I I I I M I I I I I I I I 

orfi 09-1 medlyiilalglvamiagfidaiagggglitlpalllagippvsaiatnklqaaaatfsa 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10 9a pep TVSFARKGLI DWKKG LP I AAASFAGGVVGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 
i | | | I M I M | I I M I I M I I I I : I I I : I I I I I I I I I II I I I II I I I I I i I I M I II I M 
orf 109-1 TVSFARKGLI DWKKGLPIAAASFVGGVAGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 

70 80 90 100 110 120 



45 



130 140 150 160 170 180 

orf 109a pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
i i I I M I I I I I I I I 1 I M I M I I M I M M I i I I I I I I I 1 I I I II I I I I I I I 1 I I I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 



50 



55 



60 



190 200 210 220 230 240 

orf 109a. pep LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
| | | | | | | | | | | | | M I I I II I M I M I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I 
orf 109-1 LANVACNLGSLSVFLLHGS 1 1 FPI AATMAVGAFVGANLGARFAVRFGSKLIKPLLIVI S I 

190 200 210 220 230 240 

250 260 
orf 109a . pep SMAVKLLI DERN PLYQM I VSMFX 
I I I I M I I I I I I II i I 1 ! I I I I I 
orf 10 9-1 SMAVKLLI DERNPLYQMI VSMFX 

250 260 



BNSDOCID: <WO 992457BA2_L> 



WO 99/24578 



PCT/IB98/01665 



-264- 



Homology with a predicted ORF from N. gonorrhoeae 

ORF109 shows 98.3% identity over a 23 laa overlap with a predicted ORF (ORF109.ng) from N. 

gonorrhoeae: 



10 



15 



orf 109 .pep 
orf 109ng 
orf 109 -pep 
orf 109ng 
orf 109 . pep 
orf 109ng 
orf 109 . pep 
orf 109ng 



MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 
I I I 1 I I I I I I I II I I I I ! M I I I I! I I I I I I I I I I I I II I II 1 t I I I I I I I i I M 1 M t I 

MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

TVSFARKGLIDWKKGLPIAAAS FVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 
I I I I I I I 1 I II I I M I I I M I I I : I I I : I I I I I I I I I I I I II I M I I I I I I I I I II M I I 

TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 



KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 
II I I I M I I I I II M I I I I I I M I M I I I I II I I I I I I I I I I I I I II I I I I I I I I M I I 
KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 
I I I I I I I I I I II II I I I I I II I II II I I II II I I II I I II I I I I Mill! 
IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 



180 



180 



An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 



20 acid sequence <SEQ ID 43 8>: 



25 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPV5AIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ED 439>: 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
TTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTATT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCTTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGGCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGTGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGCGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCGC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATCGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



45 



This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIVATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

2 51 RNPLYQMIVS MF* 

50 ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orfl09ng-l .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I I M I 1 I I I I II II M I I I M I I I I I I I I I I I I I I II I I I I I I II I I M I I I I I I I I I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 . 100 • 110 120 

orf 10 9ng-l.pep TVS FARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 



55 



BNSDOCID: <WO 992457BA2J_> 



10 



35 



40 



WO 99/24578 PCT/IB98/01665 

-265- 

I M It I I I I M I I I M l I I I I i t : I I I : I I I I I I I t I I I I I I M N II I I I I I I f I I I I I 
orf 109-1 TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl09na-l pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

9 HIM II I MM I INN Ml Ml I I HI III I MM Mill I MUM MUM 

orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 



190 200 210 220 230 ^ 240 

orfl09na-l pep LANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
M I I I M M I M i I I MM I I i I I: Ml I M 1 M I i I I II M M I M M I I M I I M M I 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
15 190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
I I I I II M II I M M II II I II I 
20 orfl09-l SMAVKLLIDERNPLYQMIVSMFX 

250 2 60 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

splP29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3'REGION (ORF9) 
>gi I 94 984 Ipir 1 1 138164 hypothetical protein 9 - Pseudomonas sp >gi!551929 
25 (M62866) ORF9 [Pseudomonas denitrif icans ] Length = 261 

Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives - 131/214 (60%), Gaps = 1/214 (0%) 

PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

30 * P?^ + TNKLQ R+G ++ K+ LP+ D+ 

PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 



Query : 


H 1 


Sbjct : 


43 


Query : 


101 


Sbjct : 


103 


Query : 


161 


Sbjct : 


162 


Query : 


221 


Sbj ct : 


222 



L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 

LI AFI VLLGCKLLNAMS YTKLANVACNLGSLSVFLLHGS 1 1 F P I V ATMA V G AFV G AN LG A 
F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
MLGFVTLAG FGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 

RFAVP.FGSKLIKPLLIVISISMAVKLLIDERNPL 254 
R+A+ G+K+IKPLL+++SI++A++LL D +PL 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
45 and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 

50 I . . CTGCTAGGGT ATTGCATCGG TTATCGGTAC GgCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG . ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGG CAT GG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

55 251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

4 51 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

60 501 GGTCGGATTG TTCCGGACAA TCAGGCGGTT- TATGCCAAGG ATTTC.AAGC 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 



-266- 



PCT/IB98/01665 



551 CCGAAAGTAT . TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 

601 TATTTCCG . A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence <SEQ ID 442; ORF1 10>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

5 51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 88a . pep MSKSRRSPPLLSRPWFAFFSSMRF AVALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I I I I I M I I I : M I I I I I I I I I I I I I I I I I 
15 orfllO LLGIASVIGT LL QQNQPQTD YLVKFGSFWA 

10 20 30 

70 80 90 100 110 120 

orf 88a .pep QIFGFLGLYDVYASAW FWIMMFLVVSTSLCLI RNVPPFW REMKSFREKV KEKSLAAMRH 
20 I I I I M i M I I I I I I i I I I I M II I I I II I I I I I I I I I I I M I I i II I II I I I I I I I I I 

orfllO XI FG FLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKS FREKVKEKS LAAMRH 

40 50 60 70 80 90 

130 140 150 160 170 180 

25 orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

I j I I I I I I I I I I I I I I I I I M I I I I I I I I I i I M I I I I I I I I M M M I II It I I I I I I I 

orfllO SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

100 110 120 130 140 150 

30 190 200 210 220 230 240 

orf 88a .pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 

j I I I I I 1 I I I I I I ! 1 I I I I : : : I I I I : I 

orfllO GGLI DSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 

160 170 180 190 200 210 



35 



250 260 270 280 290 300 

orf 88a . pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 



orfllO SX 

40 However, ORF88 and ORF1 1 0 do not align, because they represent two different fragments of the 
same protein. 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORF1 lO.ng) from N. 
gonorrhoeae: 

45 orf 110. pep ■ LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 30 

I I I I I I I I I I : I I II I t I It I I I I I 1 ! I : 
orf llOng MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 110. pep XIFGFLGLYDVYASAWFWIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 90 
50 II | I | | || I II II I I II I I I I I I I I I I M I I II I I I I II M I M I I I I I I I II I II I I 

orf llOng RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKS FREKVKEKS LAAMRH 120 

orfllO . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 150 
I I M I I I I I I I I I I I II I I : I M I I I : : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
55 orfllOng SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 



BNSDOCID: <WO ! 



.9924578A2_»_> 



WO 99/24578 



PCT/IB98/01665 



-267- 



orf 110 .pep 
orf llOng 
orf 110 .pep 
orf llOng 



10 



15 



GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 
i I I : I II I I I I I I : I III: II I I I I I I I I : I I I I I I I I I I I I I I : I i M I 
GRLINXNLLLKLGMLAGSIFRNNRRVMPRISKPESIWGGVQSLIKGQRQYFQRGKVRMWF 24 0 

S 211 
I 

S 241 

The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ED 444>: 

1 MSKSRISPTL LSRPW FAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGS I F 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 



20 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501- 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGCCGTCTG 
CCTGGGTTTC 
TTACCCTGCA 
TCAAATAATC 
CGATGACGCG 
ACTCCGAAAT 
ATTTCAAGCG 
CCTGACACAC 
GGGGATTCGG 
ATCAAACAGG 
AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AGCAGCCCAA 
AACAACCGTT 
TAAAAACGGC 
CCATCAGCCA 
ACGGCGGACG 
CTTAAAGCTG 
ATAAAGGCGG 
CGCTAA 



AAACACGCCT 
ATCTTCCTGA 
AGGCGAAACG 
GGGACAAACT 
CTTAAAGAAG 
CAGCCGGTTC 
ACTTCGCACA 
GGCGCGCTGG 
CCCCGACAAA 
CGGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GGCAAAAACG 
TATCGTCCAA 
CGCTTGCCAC 
AAACGCCTCT 
CAACCTCGCC 
GCTTGTCCAC 
GCAGAGCGCG 
CTACCGCACC 



GCCGAACTTT 
ACGCCTGTTC 
ATGGGCACGA 
CCCCTCACCT 
TCAACCGGCA 
AACCAACACA 
CGTTACTGCC 
ACGTAACCGT 
TCCGTTACCC 
TACGGGCATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCGCGGCGA 
GGCGGCAATA 
TTCCGGCGAT 
CCCATATCAT 
TCCATCAGCG 
AGGATTATTC 
AAAAACTCGC 
GCCATGTCTT 



ATCCGCGTCT 
GGAACAAACC 
CCTATACCGT 
GCCGAAATAC 
GATGTCCACC 
CAGCCGGCAA 
GAAGCCGTCC 
CGGCCCCTTG 
GTGAACCGTC 
GACAAAATCA 
CCACCCCAAG 
TTGATAAAGT 
GTCGAAATCG 
ACCGTGGCGC 
CGCAGATTAT 
TACCGTATTT 
CAACCCGAAC 
TGGTCGCAGA 
GTATTGGGCG 
TGTTTTCCTG 
CCGAATTTGA 



TGATATTTGC 
GCGCAAACCG 
CAAATACCTT 
AAAAACGCAT 
TATCAGCCCG 
GCCCCTCCGC 
GCCTGAACCG 
GTCAACCTTT 
GCCGGAACAA 
TTTTGAAACA 
GCCTATTTGG 
TGCGGGCGAA 
GCGGCGAGTT 
ATCGGTATCG 
CGTCCCGCTG 
TCCACGTCGA 
AACAAACGAC 
CAGTGCGATG 
AAACCGAAGC 
ATTGTCAGGG 
AAAACTGCTC 



This corresponds to the amino acid sequence <SEQ ID 446; ORF1 1 1>: 



45 



50 



l 

51 
101 
151 



MPSETRLPNF IRVLIFALGF IFLNACSEQT 



SNNRDKLPSP 
ISSDFAHVTA 
IKQAASYTGI 
201 LEKYGIQNYL 
251 ' NNRSLATSGD 
301 TADGLSTGLF 
351 R* 



AEIQKRIDDA 
EAVRLNRLTH 
DKIILKQGKD 
VEIGGELHGK 
YRIFHVDKNG 
VLGETEALKL 



LKEVNRQMST 
GALDVTVGPL 
YASLSKTHPK 
GKNARGEPWR 
KRLSHIINPN 
AEREKLAVFL 



AQTVTLQGET 
YQPDSEISRF 
VNLWGFGPDK 
AYLDLSSIAK 
IGIEQPNIVQ 
NKRPISHNLA 
IVRDKGGYRT 



MGTTYTVKYL 
NQHTAG.KPLR 
SVTREPSPEQ 
GFGVDKVAGE 
GGNTQIIVPL 
SISWADSAM 
AMSSEFEKLL 



Computer analysis of this amino acid sequence gave the following results: 



BNSDOCID: <WO 992457BA2_L> 



WO 99/24578 PCT/IB98/01665 

-268- 

Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF1 1 1 shows 96.9% identity over a 35 laa overlap with an ORF (OKF1 1 la) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 111a . pep MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
I I I I I I I I I I I I : I I M I : I I I M I I I I I I i I I I I I I I I I I I I M I I I I i I II II I I I I 
orf 111 MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 111a. pep AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
III! I I I I II I I I I I I I I I I I I I I I 11 I I I I I M I I I I I I I I I I H M M I I : I I I I I I 
orf 111 AE IQKRI DDALKEVNRQMSTYQPDSE I SRFNQHTAGKPLRI SS DFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 111a . pep GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
I I I M I I I M I I I I I I M I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 111 GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 111a . pep AYLDLSSIAKG FGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
I I I I II N II I I I I I I I M M I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M 
or fill AYLDLS S I AKGFGVDKVAGELEKYG IQN YLVE IGGE LHGKGKNARGE PWRIG IEQPNI VQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 111a. pep GGNTQI IVPLNNRSXATSGDYRIFHVDKSGKRLSHIINPNNKRPISHNLASISVXADSAM 

I Mill MIIIMI I I I I I I I I I I M I : I I I I I I I I I I I I I I I I I I I I I II I I I I I M 
orf 111 GGNTQI IVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 

250 260 270 280 290 300 



310 320 330 340 350 

orf Ilia . pep TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
MM I M I I I I I I I I i I I M I I I I M M I I I II I I I I i I I II I I I I I I I I I 
orf ill T ADGLSTG LFVLGETEALKLAEREKLAVFL I VRDKGGYRTAMS SE FEKLLRX 

310 320 330 340 350 

The complete length ORF1 1 1 a nucleotide sequence <SEQ ID 447> is: 



1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

2 51 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

4 51 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCAAAAACG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

7 01 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

7 51 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

8 01 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 
8 51 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 
901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 
951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG- CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 448>: 



1 MPSETRLPNF IRTLIPALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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101 ISSDFAHVTA EAVHLNRLTH GAL DVT VG P L VNLWGFGPDK SVTREPSPEQ 
151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 
201 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGKTQIIVPL 
251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 
5 301 TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R + 

Homoloev with a predicted ORF from N gonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 351aa overlap with a predicted ORF (ORF1 1 l.ng) from N. 



10 gonorrhoeae: 

10 20 30 40 50 60 

orf lllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
| | | | | | | j | : ) I : I I I I II I I I 1 I I I II II I i i I I I I I I I I I I I I I I I I I I i II i I I I I I 
or f HI MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
15 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 111 AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
I : I | I I I I I I I I I II I II I I II I 1 I I I II I I I I I I I I I I I I I I II I I I I I I I II I M I I 
20 orf 111 AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf lllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKOAASYTGIDKIILQQGKDYASLSKTHPK 
25 | I I M I M I I I I I II I M II I I I I I I I I II I I M M I I I M I I I I : 11 I I I I I I M M I I 

orf 111 GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 

190 200 210 220 230 240 

30 orf lllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 

I M M M II I I M I I I M I I I I I I I I I I I I I 11 I I I I I I It I I I : I I I I I I I I I I I M : I 
orf 111 AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



35 250 260 270 280 290 300 

or ^1 line GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
I I I I I I I II I I I I I I I I I I 1 I I I II I I I I I U I I I I I I I I I I I I I I I I I I I M M : I I I I 
orf ill GGNTQI I VPLNNRSLATSGDYRI FHVDKNGKRLSHI INPNNKRPI SHNLAS ISVVADSAM 

250 260 270 280 290 300 

40 

310 320 330 340 350 

orf lllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
I I I I M I I I II I I I I I II : I I I : I I I I I I I I I I I I I I I II I I I I I Mill 
orf 111 TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
45 310 320 330 340 350 

The complete length ORF1 1 lng nucleotide sequence <SEQ ED 449> is: 



1 ATGCCGTCTG AAACACGCCT GCCGAACCTT ATCCGCGCCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGaacaaacC GCGCAaaccg 

101 TTACCCTGCA AGGCGAAAcg aTGGGTACGA CCTATACCGT CAAATACCTT 

50 151 TCAAATAATC GGGACAAACT CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 

201 TGATGATGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TACCAGACCG 

251 ATTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ATTTCGCACA CGTTACCGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACTCAC GGCGCACTGG ACGTAACCGT CGGCCCTTTG GTCAACCTTT 

55 4 01 GGGGGTTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

4 51 ATCAAACAGG CGGCATCTTA TACGGGCATA G AC AAAAT C A TTTTGCAACA 

501 AGGCAAAGAT TACGCTTCCT T GAG C AAAAC CCACCCCAAA GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAAtcg gcggcGAGTT 

60 651 GCACGGCAAA GGCAAAAATG CGCACGGCGA ACCGTGGCGC ATCGGTATAG 

7 01 AGCAACCCAA TATCATCCAA GgcgGCAata CGCAGATTAt cgtcccgctg 

751 aaCaaccgtt cgctTGCCAC TTCCGGCGAT TAccgtaTTT tccacgtcgA 

801 TAAAAAcggc aaacgccttt cccacaTCAT CAATCCCaAC aacAAACgac 

851 ccATCAGcca caacctcgcc tccatcagcg tggtctcAGA CAGTGCAATG 

65 901 ACGGCGGACG GTTtatCCAC AGGATTATTT GTTTTAGGCG AAACCGAAGC 

951 CTTAAGGCTG GCAGAACAAG AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

5 51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

10 301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from H.influenzae: 

splP44550 i YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi \ 1074292 Ipir I 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
15 >gi | 1573128 (U32702) hypothetical [Haemophilus influenzae] Length = 346 

Score = 353 bits (896) , Expect = 9e-97 

Identities = 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 
20 " + LI +1 + LAC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGI IAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query : 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 
1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 

25 Sbjct: 59 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 12 6 TVGPLWLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
SSIAKG FGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I IE+P + 

Sbjct: 17 9 SSIAKG FGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 

35 Query: 24 6 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAMTADGL 305 

+ + LNN +A+SGDYRI+ -f+NGKR +H I + P PI H+LASI+V++ ++MTADGL 

Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 30 6 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 34 9 
40 " STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 

Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLI IRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



30 



Example 54 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGC CkA yTGGCAATCG GCGTGATGGG 

50 2 01 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

2 51 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 
301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

3 51 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

4 01 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 
55 4 51 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

60 7 01 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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BNSDOCID: <WO. 



751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 

1 PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 

51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 

101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 

251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 



Homology with putative secreted VirG-homolgue of N. meningitidis (accession number A32247) 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 

Orf35 5 QG DDV Y AAHAS RQKLW LR F I GGRSHQN I RGGAA-ADGWRKGVQ I GGEVFVRQNEGSXLA I 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLSI 455 

Orf35 64 GVMGGRAGQHASVNGKG— GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 121 

G+MGG+A Q ++ + ++ G+G GVYA WHQL+ DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRS T FHN P DT DN LTTGN VKG FGAGVYAT WHQLQ DKQTGAY ADS WMQYQRFRH 515 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 

RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 

Orf35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 

SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 57 6 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 

Orf35 242 AGRT ALEGR FG I EAGWKGHMS 262 

+TA+E + G+ K H++ 

virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of AT. 



meningitidis: 



' ' 10 20 30 

orf 35 . pep PCRRQGDDVY AAHASRQKLWLRFIGGRSHQN I RG 

: ! I I ! I I I I I I I I II I I I I I I II 11 I I ! 
orf 35a QRLAI PEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 

310 320 330 340 350 -360 

40 50 60 70 80 90 

orf 35 . peo GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 
MINI I I t i M I I I I I I I I I I M 1 I I I I I I I I I I I 1 I I I I II I I I I I I I : I I I I M 
orf 35a GAAADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHASVNGKGGAAGSYLHGYGGGV 
370 380 390 400 410 420 

100 110 120 130 140 150 

orf 35 . pep YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 

I I | I I I I | I I I I I I I I ! I I I I I I I I I I I I I I M I I I ! I I I I I I I I I I I I I I I M 1 I I I : I 
orf 35a YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 
430 440 450 460 470 480 

160 170 180 190 200 . 210 

orf 35 . pep GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
I 1 I U I I I I M I I I M I I I I I II I I I M II I I I II II I I I I I M I I I I I I I I I I i I I I I I 
orf 35a GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
490 500 510 520 530 540 

220 230 240 250 260 

or f 35 . pep LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRT ALEGR FG I EAGWKGHMS A 
I | | | | | I I I II I M M I It I M I I I I I I I M I 1 M I M I I II M I I I I I 



9924578A2_I_> 



WO 99/24578 



-272- 



PCT/IB98/01665 



orf35a lqpfaafnvlhrsksfgvemdgekqtlagrtalegrfgieagwkghmsarigygkrtdgd 

550 560 570 580 590 600 

orf35a KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 



10 
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25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 

iesi 



ATGTTCAGAG 
CGATGCCGAT 
ATTTTTCCAG 
GAAATCAATA 
TAATATGCCC 
TAAAGGATGC 
GAAGCTTGGG 
GCTTGGACCA 
ATAAATTGGT 
ACTAGTCTCA 
CAAATCCCAC 
AAGATTCCCT 
ACTTCCGATA 
CGTCCATAAA 
TGCGGGAGTC 
CAATC CGGCG 
CGGGCGCAAA 
TTAAACAAAA 
GAAGGCGGAT 
AGCGGTTTTA 
TGCGTGCCGC 
CAAAAATTGT 
GGGCGGCGCG 
AGGTGTTTGT 
GGCGGCAGGG 
AGGCAGTTAT 
AGTTGCGCGA 
CAACGTTTCA 
AACCAAAGGT 
CGGAAGGCGT 
CAGGCGCAGT 
GGGGACGGCG 
GCATTCGGGC 
CCTTTTGCCG 
AATGGACGGC 
GGTTCGGCAT 
TACGGCAAAA 
GCTGTTTTGA 



CTCAGCTTGG 
TTTTCATTTT 
CGGTAAAACC 
TCCAAGGTAA 
GTTGTTAAGA 
GGTTAAGAAG 
AAGAAAATAA 
AAATTTAGTA 
AGAAGATTCC 
ACAACATCTT 
GTCGCCGGAC 
TTGGGAACCG 
ATGCCCGCAT 
GCGTATCAGG 
GGACAAACCC 
TGGTTTTGGA 
CTGATTGCGG 
TTACCGGCAG 
TTTGCTTGGG 
TATGCCCAAC 
CGACAGGGGC 
GGCTGCGCTT 
GCTGCGGACG 
ACGGCAAAAT 
CTGGCCAGCA 
TTGCATGGTT 
TAAACAAACG 
AACACCGCAT 
TGGACGGCTT 
TGTCGGAAAA 
TTACCTACTT 
GTCGGACTGC 
AAAAACCCGT 
CTTTTAATGT 
GAAAAACAGA 
TGAAGCCGGT 
GGACGGACGG 



TTCAAATACT 
CAGACAAGCC 
GATCAAAATT 
AAACTACAAT 
AATATATTAC 
CAATTACAGG 
AAAACGG AC T 
TACTCAAACA 
GTACTCACTC 
CAATAAAAAA 
AGGTGTTGGA 
CGCCGCCATT 
CCGCCTGAAC 
GCGGTGCGGA 
GCCCTGACCT 
ACGCCGGCCG 
CGGAAAAGGC 
GGACTGTACG 
CGTGCAGCGT 
AGGCTTATGC 
GACGACGTGT 
CATCGGCGGC 
GGCGGCGCAA 
GAAGGCAGCC 
CGCATCAGTC 
ATGGCGGGGG 
GGTGCGTATT 
CAATGATGAA 
CTGTCGAAGG 
GGCAATAATG 
GGGCGTAAAC 
TCGGCAGCGG 
TTTGCTTTGC 
TTTGCACAGG 
CGCTGGCAGG 
TGGAAAGGCC 
CGACAAAGAA 



CGTTCTACCA 
GAAACCCGGC 
CATCCGAATA 
AGCGGCATAC 
AGATACTTAC 
ATTTATACAA 
GAGGAGGCGT 
GAAAAACCCC 
CTCATAGTAA 
TTACACGTCA 
ACTGACCAAG 
CCGACATCCA 
ACGAAAGATG 
TTTCCTGTTC 
TTGAAGAAAA 
GAAAATCTGA 
AG ACT CTAAT 
AATTATTGCT 
TTGGCTATCC 
GGCAAATACT 
ATGCCGCCGA 
CGGTCGCATC 
AGGCGTGCAA 
GGCTGGCAAT 
AACGGCAAAG 
TGTTTATGCT 
TGGACGGCTG 
AACCGTGCGG 
CGGCTACAAC 
TGCGGTTTTA 
GGCGGCTTTA 
TCAGTGGCAA 
GTAACGGTGT 
TCAAAATCTT 
CAGGACGGCG 
ATATGTCCGC 
GCCGCATTGT 



AAATCGGCGA 
ACTTCCCATT 
TGGGTATGAC 
TCGCCGTCGA 
GGGGATAATT 
AACAAGACCC 
ATATAGAACA 
GATTTAATTA 
TACATCACAG 
AAATCGAAAA 
ATGACGCTGA 
TATGCTGGAA 
AAAAACTGAC 
GGCTACGACG 
AGT CAGCGGA 
AAACGCTCGA 
TCGTTTGCGT 
CAAGCAATGC 
CCGAGGCGGA 
TTGTTCGGGC 
TCCGTCCCGT 
AAAATATACG 
ATCGGCGGCG 
CGGCGTGATG 
GCGGTGCGGC 
GCGTGGCATC 
GTTGCAATAC 
AACGCTACAA 
GCGCTTGTGG 
CCTGCAACCG 
CCGACAGCGA 
AGCCGCGCCG 
CAATCTTCAG 
TCGGCGTGGA 
CTCGAAGGGC 
ACGCATCGGA 
CGCTCAAATG 



45 This encodes a protein having amino acid sequence <SEQ ID 454>: 



50 



55 



1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP 

2 51 QSG WLERRP ENLKTLDGRK L1AAEKADSN SFAFKQNYRQ 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN 

4 01 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT 

4 51 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK 

501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR 

551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG 

601 YGKRTDGDKE AALSLKWLF* 

Homology with a predicted ORF from N. gonorrhoeae 



DQNSSEYGYD 
QLQDLYKTRP 
VLT PHSNTSQ 
RRHSDIHMLE 
ALTFEEKVSG 
GLYELLLKQC 
DDVYAADPSR 
EGSRLAIGVM 
GAYLDGWLQY 
GNNVRFYLQP 
FALRNGVNLQ 
WKGHMSARIG 



60 ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from N. 
gonorrhoeae: 

orf35.pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

I : I I I I I I : I : I : : I 

orf35ngh FTKVQERDDI AI YAQQAQAANTLFALRLNDKNSDI FDRTLPRKGLWLRVI DGHSNQWVQG 370 
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10 



15 



orf 35 . pep GAA-ADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKG — GAAGSDLYGYG 91 

: I : : I : i I i I I : I I t I ! : 111:: I : I I : I I I : I I : : : : : : : = = I : I 

orf35ngh KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 430 

orf 35 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I II I : I I I I : I I I I I M : I : I : I I I i I : I I I i I : M : : I I II I : i : M I M : I I 
orf35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

orf 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : III::! I II I I I I : II I I II I I : I I I : : I : I I I I I I I I : I : : II : : I I : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 . pep GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 263 

I I : : I I I : I I : : : : I I I I I : I I : : : : : : : I : : I : : I : I I : I : : 
orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 



20 



25 



30 



1 


. . KKLRDRNSEY 


51 


FYDKEYTEDY 


101 


ISSFGNYGPI 


151 


GWQEHLYAGL 


201 


YLITKNAEIR 


251 


NITGTSDIIF 


301 


LSLQQRPEGF 


351 


PRKGLWLRVI 


401 


GLMGGQAEQR 


451 


SWMQYQRFRH 


501 


VYLQPQAQLT 


551 


GVT FQPFVAV 


601 


TLQASFNRQT 



WKEETYHIKS 
LVGFARGFGV 
LIKSDIFALA 
DPFHYIEVTD 
FNTKNESLLV 
EGKALDNLKH 
FTKVQERDDI 
DGHSNQWVQG 
STFRNPDTDN 
RINTEYATER 
YLGVNGKFSD 
NSIYQQKPFG 
SKHHHAKQGA 



NGRTYPNIPA 
EKRNGEEEKP 
SQIKNSHINS 
NSHVIGQTID 
KEDYAGGARF 
LDGHQIVKVN 
AIYAQQAQAA 
KTAPVEGYRK 
LTTGNVKGFG 
FTSKGITASI 
SENAQVNLLG 
VEIDGDRRVI 
LNLQWTF* 



LFPKHPFDPF 
LRQYFKDCVN 
EILSVGNYIE 
LGALELTNSL 
RFAYDLKDKV 
DTADKDAFRL 
NTLFALRLND 
GVQLGGEVFT 
AG V Y AT W H QL 
EAGYNALLAE 
SRQLQSRVGV 
NNKTVIETQL 



ENINNSKKIS 
TENSNNDNCK 
WLRPTLNKLT 
WKPRWNSNID 
PEIPVLTFEK 
SSKYRKGIYT 
KNSDIFDRTL 
WQNESNQLSI 
QDKQTGAYVD 
HFTKKGNSLR 
QAKAQFAFTN 
GVAAKIKSHL 



601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N. meningitidis and N. gonorrhoeae ; and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 

35 The following partial DNA sequence was identified in N. meningitidis <SEQ ED 457>: 



40 



l 

51 
101 
151 
201 
251 
301 
351 



. GCGGAATATG 
GGGCGGTATA 
AGGTTGATAG 
AATGTTCAGG 
TGCGCAACGA 
TAGGTGGTGA 
ACCCGTGGTG 
TGGGGT.TTA 



TTCAGTTCTC 
CCTAAGGCTA 
GAAGCTTAAT 
AAACGAGAAG 
GAATGGGAAA 
TAT C AAT AAA 
ATGTACGGGT 
TCAAGCGACA 



TATAGATTTG 
AGCCTGTGTT 
AAATTGACAA 
AAGGAGTCAG 
ATAAAACAGG 
AAAGGCACAG 
GATACAACAA 
GTGGAAATTN 



TTCAGTGTGG 
TGATGCGAAA 
CTCGTGAGCA 
AGTAGTCAGT 
GTTAGATTTT 
TAACAGGAGG 
ACCTCGGCAC 
A 



GTAAATCGGG 
CCGAGATGGG 
GGTGGAG AAA 
TTAAAGCCCA 
AATCATTTTA 
GCATAGTCTA 
CTGATAAACA 



45 



This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 
51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 

101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 



. GCAGTGTGCC 
TTTTATCCGG 
ACCACCTATT 
GGATTGGGAA 
GGCGGCCATT 
GGCACGAAGT 
GATGAAGCCG 
GGACGGATAC 
GCGGCT AT CC 



TnCCGATGCA 
CAGGTTCTCG 
CGGCAGCAGG 
AAATACAAAG 
AAAGGAAATA 
CCATTCCCCs 
GTAGTCCCGT 
GAACACCATC 
CGCTCCCAAA 



TGCACACGCC 
ACCGTCAGCA 
GGGGAACTTG 
CCATCAGTTG 
TCGGCTACAT 
TTCGACAACC 
TGACGGATTT 
CCGCCGACGG 
GGCGCGAGGG 



TCAnATTTGG 
TTTCGAACCC 
CCGAGCGCCA 
GGCAACCTGA 
TGTCCGCTTT 
ATGCCTCACA 
AGCCTTTACC 
CTATGACGGG 
ATATATACAG 



CAAACGATTC 
GACGGGAAAT 
GTCTCATATC 
TGATTCAACA 
TCCGATCACG 
TTCCGATTCT 
GCATCCATTG 
CCACAGGGCG 
TTACGACATA 
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4 51 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 T AAAAAC AT C ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 . . AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N. gonorrhoeae 

ORF46 shows 98.2% identity over a lllaa overlap with a predicted ORF (ORF46ng) from N. 



1 5 gonorrhoeae: 

orf 4 6 .pep 
orf 4 6ng 
20 orf 4 6. pep 

orf 4 6ng 



AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 

I II I M I I II I I I I I I I II I I I 11 I I M M 
PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 



45 



21*7 



25 



orf 46 .pep 
orf 46ng 



EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I | | | | | t I I I I I I I I I I I II I I I M I I I I I I II I I M I I I I M I I I I : I I I I I I II I I I I 
EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 277 

RV I QQT S APDKHGXLS S DSGN 12 6 
I I I I I I I I I I I II I I I i I i I 
RVIQQTSAPDKHGVLSSDSGN 2 98 



A partial ORF46ng nucleotide sequence <SEQ ED 461 > is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 



30 



i 

51 
101 
151 
201 
251 



. RRLKHCCHAR 
RTRHRSRQQY 
EIRRQRQXCR 
KLADQRHPKT 
AKPRWEVDRK 
DFNHFIGGDI 



LGSAFHRKQD 
LYGSHPHQRD 
CRLGKIPSLS 
GVPFDGKGFP 
LNKLTTREQV 
NKKGAVTGGH 



GAHQRFGRYG 
WSCPGKIQLG 
IPKYPLKLEQ 
NFEKHVKYDT 
EKNVQETRRR 
SLTRGDVRVI 



ATQRLCRSSH 
RHHGTSCRAV 
RYGKENITSS 
KLDIQELSGG 
SQSSQFKAHA 
QQTSAPDKHG 



PRLGSPKPQC 
ADXRDRICER 
TVPPSNGKNV 
GIPKAKPVFD 
QREWENKTGL 
VLSSDSGN* 



35 Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



40 



45 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



TTGGGCATTT 
CCTGCCGATG 
GgCaggttcT 
TTcggCaGCA 
aaacaTAcaa 
ttgaaggaaA 
ttccattcgc 
CGGTAGTCCC 
ACGAACACCA 
CCCGCTCCCA 
TGCCCAAAAT 
GGCTTGCCGA 
GGCGACGGAT 
GGGCAATGCc 
TCATCGGCGC 
ATAAGCGAAG 
CACCGAAAAC 
T C AAAG ACT A 
AATGCCGCAC 
CCCCATCAAA 
TCACGGCACA 
AAAGGGAAAT 
ATACCCGTCC 



CCCGCAAAAT 
CATGCACACG 
CGaccGTCAG 
GGGGGGAGCT 
Agccatcagt 
TAtcgGctac 
ccttcGAcaa 
GTTGACGGAT 
TCCCGCCGAC 
AAGGCGCGAG 
ATCCGCCTCA 
CCGTTTCCAC 
TCAAACGCGC 
gccGAAGCCT 
GGCAGGAGAA 
GCTCAAACAT 
AAGATGGCGC 
TGCCGCAGCA 
AAGGCATAGA 
GGGATTGGAG 
TCCTGTCAAG 
CCGCCGTCAG 
CCTTACCATT 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGaac 
TgccnagcGC 
tGggccacct 
attgtccgct 
ccaTGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
G GAT AT AT AC 
ACCTGACCGA 
AATGCCGGCG 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTCATG 
GCATCAACGA 
GCCATCCGCG 
AGCCGTCAGC 
CTGTCCGGGG 
CGGTCGCAGA 
CGACAATTTT 
CCCGAAATAT 



CTGTCCATAC 
GGcaAACGAT 
ccgacggGAa 
aacggccATa 
gatgattcaa 
tttccgatca 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
CTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 
ATTGGGCAGT 
AATATCTTTA 
AAAATACGGC 
TGGGCGCGAT 
GCCGATGCGG 
CCGTTCAAAC 



TGGCAGTGTG 
CCCTTTATCC 
ATACCaCCTA 
tcggattggG 
caggcggccg 
cgggcacaaa 
CTGACGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
T G G A C AG AT C 
G T C AAAAAC A 
CGTGCagGGT 
GTCTGCTTTC 
ATGGCGCAAC 
CCAAAACCCC 
TGGCAGCCAT 
TTGGGCGGCA 
CGCATTGCCG 
CATACGCCAA 
TTGGAGCAGC 



BNSDOCID: <WO 9924578A2J_> 
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10 



15 



1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



GTTACGGCAA 
AAAAATGTCA 
TGACGGTAAA 
AGCTCGATAT 
GTGTTTGATG 
GACAACTCGT 
GTCAGAGTAG 
ACAGGGTTAG 
CACAGTAACA 
AACAAACCTC 
ATTAAAAAGC 
AGTGATGACC 
TTAGGGCTGA 
AATAAATGGC 
CGAACCTAAT 



AGAAAACATC 
AACTGGCAGA 
GGGTTTCCGA 
TCAAGAATTA 
CGAAACCGAG 
GAGCAGGTGG 
TCAGTTTAAA 
ATTTTAATCA 
GGAGGGCATA 
GGCACCTGAT 
CTGATGGAAG 
AAGCACACCA 
AGTTACTTCG 
AGGGTACAAG 
AGAACAGCAT 



ACCTCCTCAA 
CCAACGCCAC 
ATTTTGAGAA 
TCGGGGGGCG 
ATGGGAGGTT 
AGAAAAATGT 
GCCCATGCGC 
TTTTATAGGT 
GTCTAACCCG 
AAACATGGGG 
TTGGGAGGTG 
TGTTCCCAAA 
GCTTGGGAAA 
TAAATCGGGT 
ATCCCATTTA 



CCGTGCCGCC 
CCGAAGACAG 
GCACGTGAAA 
GTATACCTAA 
GATAGGAAGC 
TCAGGAAACG 
AACGAGAATG 
GGTGATATCA 
TGGTGATGTA 
TTTATCAAGC 
AAAACGAAAA 
AGATTGGGAT 
GTAGAATAAT 
ATTAAAATAG 
TGAATAG 



GTCAAACGGC 
GCGTACCGTT 
TATGATACGA 
GGCTAAGCCT 
TTAATAAATT 
AGAAGAAGGA 
GGAAAATAAA 
ATAAGAAAGG 
CGGGTGATAC 
GACAGTGGAA 
AAGGTGGGAA 
GAGGCTAGAA 
GCTTAAGGAT 
AAGGATTTAC 



This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l: 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



LGISRKISLI LSILAVCLPM HAHASDLAND 



FGSRGELAXR 
FHSPFDNHAS 
PAPKGARDIY 
GDGFKRATRY 
ISEGSNIAVM 
NAAQGIEAVS 
KGKSAVSDNF 
KNVKLADQRH 
VFDAKPRWEV 
TGLDFNHFIG 
IKKPDGSWEV 
NKWQGTSKSG 



NGHIGLGNIQ 
HSDSDEAGSP 
SYDIKGVAQN 
SPELDRSGNA 
HGLGLLSTEN 
NIFMAAIPIK 
ADAAYAKYPS 
PKTGVPFDGK 
DRKLNKLTTR 
GDINKKGTVT 
KTKKGGKVMT 
IKIEGFTEPN 



SHQLGHLMIQ 
VDGFSLYRIH 
IRLNLTDNRS 
AEAFNGTADI 
KMARINDLAD 
GIGAVRGKYG 
PYHSRNIRSN 
GFPNFEKHVK 
EQVEKNVQET 
GGHSLTRGDV 
KHTMFPKDWD 
RTAYPIYE* 



PFIRQVLDRQ 
QAAVEGNIGY 
WDGYEHHPAD 
TGQRLADRFH 
VKNIIGAAGE 
MAQLKDYAAA 
LGGITAHPVK 
LEQRYGKENI 
YDTKLDIQEL 
RRRSQSSQFK 
RVIQQTSAPD 
EARIRAEVTS 



HFEPDGKYHL 
IVRFSDHGHK 
GYDGPQGGGY 
NAGAMLTQGV 
IVGAGDAVQG 
AIRDWAVQNP 
RSQMGAIALP 
TSSTVPPSNG 
SGGGIPKAKP 
AHAQREWENK 
KHGVYQATVE 
AWESRIMLKD 



30 ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 



35 



40 



45 



50 



55 



60 



orf 46-1 . pep 
orf 46ng-l 

orf 4 6-1 . pep 
orf 4 6ng-l 

orf 4 6-1 . pep 
orf 4 6ng-l 

orf 46-1 . pep 
orf 4 6ng-l 

orf 4 6-1 . pep 
orf 4 6ng-l 



10 20 30 40 

AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I M t M I I I I I MM I I I I I I I I I I I II I I I I I I I I I I I I M I 
LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQH FEPDGKYKL FGSRGELAXR 
10 20 30 40 50 60 

50 60 70 80 90 100 

QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
: : ! I I I I : I I I I I I I : I I I I 11 1 : : I II 1 I M I t I 1 i I I : M I I I I I I M I I I I 1 M I I 
NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
70 80 90 . 100 110 120 

110 120 130 140 150 160 

VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

I I I I I I I I I I I II I II I I I I i I II I I I I I I I I I I I I I I I M i I I I I I M I I I I I I I I I I I 
VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
130 140 150 160 170 180 

170 180 190 200 210 220 

TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I I I M I I 11 I II I : t I 1 I I I II I I I I I I I M I I 1 1 I M I I I 11 I I M I I I I i M I II 11 I 
TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
190 200 210 220 230 240 



I 
I 

IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
250 260 270 280 290 300 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 

10 20 30 40 50 60 
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orf 4 6a pep LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I i I I I I I I I I I I I I I I ! I I I I I I i I I I I M I I I I I M i 

orf4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 

^ 70 80 90 100 110 120 

orf 4 6a pep SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
: | | | I | M M M I) I : I : M I I I : M I I I I I I i I I I 1 I : I M I I I I I I I M i I I I I I I 
orf4 6nq-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
10 70 80 90 100 110 120 

130 140 150 160 170 180 

or f 4 6a pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
I I M | | I | M I I I M ! I M I I I t t I I I I I M I I I t I I 1 I I M M I I I I I I I I I 1 I I I I t I 
15 orf 4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

130 140 150 160 • 170 180 

190 200 210 220 230 240 

orf 4 6a pep TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
20 " l M I I : I I M I : I : I I I I I I H I I I M I I I I II I I I I I M I I I I I I I I I 1 I I I I I I I I I I 

orf 4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 

250 260 270 280 290 300 

25 orf 4 6a pep IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

! | | | | | M | M I I I I II I I I I I I II I M I I I II I I II I I II I I I I I I I I I I II I I I I I I I 
or f 4 6ng- 1 IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

250 260 270 280 290 300 

30 310 320 330 340 350 360 

orf 4 6a. pep NAAQGIEAVSNIFTAVIPVKGIGAVRGKYGLGGITAHPVKRSQMGEIALPKGKSAVSDNF 
j 1 I I I 1 I I I 1 I I I I : I I : I I I I I I I I I I I II M I I I II ! I I M I I I I I I I M I I I I I I 
orf4 6ng-l NAAQGIEAVSNIFHAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 

310 320 330 340 350 360 

35 

370 380 390 400 410 420 

orf 4 6a pen ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
I | | | I | I M I M I I II I I II I I I I I I I I I I I M I I I I I I M I 1 M I : : I I I I I I I I I I I 
or f 4 6ng- 1 ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 
40 370 380 390 400 410 420 

430 440 450 460 470 

orf 4 6a pep GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN — PKGSVGSAHSWSITARIQYAKLP 

| | | 1 1 | | M | | | : : : : : : : I : I I I : I : I : : : I : I I i 

45 orf46ng-l GFPN FEKHVKYDTKLD- - 1 QELSGGG I PKAKPVFDAKPRWE VDRKLN - KLTTREQVEKNV 

430 440 450 460 470 



50 



480 490 500 510 520 530 

orf 4 6a. pep RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
:: I I 

orf 4 6ng-l QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 
480 490 500 510 520 530 



The complete length ORF46a DNA sequence <SEQ ID 465> is: 

1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

55 51 CCTGCCGATG CATGCACACG CCTCAGATTT GGCAAACGAT TCTTTTATCC 

101 GGCAGGTTCT CGACCGTCAG CATTTCGAAC CCGACGGGAA ATACCACCTA 

151 TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

201 AAACATACAA AG C C AT C AG T TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

251 T T AAAG G AAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

60 301 GTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

4 51 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

65 551 GGCTTGTCGA CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC GCCGAAGCTT TCAACGGCAC TGCAGAT AT C GTCAAAAACA 

7 01 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCAGGGT 

7 51 AT AAG CGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

70 801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 



BNSDOCID: <WO 992457BA2_I_> 
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-277- 
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10 



15 



851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 



TCAAAGACTA 
AATGCCGCAC 
CCCCGTCAAA 
TCACGGCACA 
AAAGGGAAAT 
ATACCCGTCC 
GTTACGGCAA 
AAGAATGTGA 
TGACGGTAAA 
GAATTAATAC 
AATCCTAAAG 
AATTCAATAC 
CTAAAAATTA 
TATTTGGATA 
AGGTCAAGAA 
AACTTGGATG 
GGAAAGATTA 



TGCCGCAGCA 
AAGGCATAGA 
GGGATTGGAG 
TCCTGTCAAG 
CCGCCGTCAG 
CCTTACCATT 
AG AAAAC AT C 
AACTGGCAAA 
GGGTTTCCGA 
CGCTGTACCA 
GTTCTGTCGG 
GCAAAATTAC 
CTCTCCTTCA 
AATTTGGTAA 
TTTGAAT GGG 
GGCTAGTAGG 
CACACAAATG 



GCCATCCGCG 
AGCCGTCAGC 
CTGTTCGGGG 
CGGTCGCAGA 
CGACAATTTT 
CCCGAAATAT 
ACCTCCTCAA 
CAAACGCCAC 
ATTTTGAAAA 
CAAGTGAATC 
ATCGGCTGAT 
CAAGGCAAGG 
GCACCGCTAC 
TGAATGGACT 
ATGTTCAATT 
GATGGTAAGC 
A 



ATTGGGCAGT 
AATATCTTTA 
AAAATACGGC 
TGGGCGAGAT 
GCCGATGCGG 
CCGTTCAAAC 
CCGTGCCGCC 
CCGAAGACCA 
AGACGTAAAA 
CTATAGATGA 
TCTTGGTCTA 
TAGAATCAGA 
CAAAAGGACC 
AAAGGT C CAT 
GT CT AAAAC A 
ATTTAAATAT 



CCAAAACCCC 
CGGCAGTCAT 
TTGGGCGGCA 
CGCATTGCCG 
CATACGCCAA 
TTGGAGCAGC 
GTCAAACGGA 
AAGTGCCGTT 
TACGATACGA 
ACCCGTCTTT 
TAACTGCCAG 
TATATCCCAC 
TAATAATGGA 
CAAGAACTAA 
GGAAGAGAGC 
ATCAATTGAT 



This corresponds to the amino acid sequence <SEQ ID 466>: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



LGISRKISLI LSILAVCLPM HAHASDLAND 



FGSRGELAER 
VKSPFDNHAS 
PAPKGARDIY 
GDGFKRATRY 
ISEGSNIAVM 
NAAQGIEAVS 
KGKSAVSDNF 
KNVKLANKRH 
NPKGSVGSAH 
YLDKFGNEWT 
GKITHK* 



SGHIGLGNIQ 
HSDSDEAGSP 
SYDIKGVAQN 
SPELDRSGNA 
HGLGLLSTEN 
NIFTAVIPVK 
ADAAYAKYPS 
PKTKVPFDGK 
SWSITARIQY 
KGPSRTKGQE 



SHQLGNLFIQ 
VDGFSLYRIH 
IRLNLTDNRS 
AEAFNGTADI 
KMARINDLAD 
GIGAVRGKYG 
PYHSRNIRSN 
GFPNFEKDVK 
AKLPRQGRIR 
FEWDVQLSKT 



SFIRQVLDRQ 
QAAIKGNIGY 
WDGYEHHPAD 
TGQRLVDRFH 
VKNIIGAAGE 
MAQLKDYAAA 
LGGITAHPVK 
LEQRYGKENI 
YDTRINTAVP 
YIPPKNYSPS 
GREQLGWASR 



HFEPDGKYHL 
IVRFSDHGHE 
GYDGPQGGGY 
NTGSMLTQGV 
IVGAGDAVQG 
AIRDWAVQNP 
RSQMGEIALP 
TSSTVPPSNG 
QVNPIDEPVF 
APLPKGPNNG 
DGKHLNISID 



Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



35 Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

40 151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG. . . 

45 This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

50 1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

' 101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

55 251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 



BNSDOCID: <WO 9924578A2J_> 
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10 



15 



20 



401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



CCGCCGCCAA 
GCGGCAGCCG 
GATGGCCAAT 
CGATGCTCTA 
GTCG AT CCCG 
TCTGAACGAG 
GGGGGCTGCC 
CTGGCGCAAA 
CATCGGCGCG 
GTTTGCGCGG 
TGCCTCCCCA 
CGGCGCGGGC 
GCTTTCAAGA 
GCCATTTTCG 
ATTTTTCAAA 
GCCACGCCGA 
ACCGAATATG 
GCACACCCAA 
TGAAAGGCAC 
AACCTCAATG 
GAACTTCAAA 



AACCGACTTC 
GCTATTTCAC 
ATCTTCGGCG 
CACCGTCAGC 
TCTTCCTCCC 
CCGAAATCTC 
GGCCAATCCC 
AAGACCGTTT 
ACGGTCGAAG 
GTTCGCACTG 
ACCGTTTGAA 
AGTTCGCTTT 
AATCAAAACC 
GCGGCGTGTG 
AAACACGACA 
CTATCCCGAA 
GCCTGCCCGC 
TTCTTCGACC 
GGAAGTCATC 
AAACCTTCCG 
ATCAAATAA 



CGGCACATTG 
CGGCCATTTG 
CAAACAACTT 
CAGAATGCCG 
CTTGGGCAAT 
AAAAAATCCT 
GAACTTCAAA 
TTCGGTTTGG 
GCGAAATGCG 
CGCCGCGCGC 
ACAAGAAGGT 
ACGACCGCTT 
GCCGAAAACC 
CGACAGCGAG 
AGGGACTGTT 
TCCGACATTT 
CGAAACCGAC 
AACTGGCGGA 
ATCGTCGGCG 
CTACCTCAAA 



CCGTCTGCGC 
AGTTACTACG 
CTACTACGCC 
ACTTTATTAC 
CAACAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAACTGTGT 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
CTCTGCCGCA 
TTTGATCCAA 
ACCATCCGCC 
CAGGGGCACG 



CGCCGTTGTG 
ACCGGGGTCG 
AAAAGTCAGG 
CGCCGGCCTG 
CCGCCACGCA 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGCCTGGCT 



This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 



LDYLPAALLI 
NLVPFI LTAP 
AAAGYFTG HL 
VDPVFLPLGN 
LAQKDRFSVW 
CLPNRLKQEG 
AIFGGVCDSE 
TEYGLPAETD 
NLNETFRYLK 



ALPWRFVKIA 
APYQIMTGLL 



SYYDRGRMAN 
QQRAATHLNE 
ESGSFPFIGA 
YAT FAMHG AG 
LFGEVSAFFK 
LCRNFSLHTQ 
QGHVAWLNFK 



G VLAFWLAVL FDGLMMVI QL 
LLYMLAMPFV L QKAAAKT DF 
IFGANNFYYA KSQAMLYTVS 
PKSQKILFIV AESWGLPANP 
TVEGEMRELC AYGGLRGFAL 
SSLYDRFSWY PRAGFQEIKT 
KHDKGLFYWM TLTSHADYPE 
FFDQLADLIQ RPEMKGTEVI 
IK* 



FPFMDLIGAI 
RHIAVCAAVV 
QNADFITAGL 
ELQNAT FAKL 
RRAPDEKFAR 
AENLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



35 



40 



45 



50 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
I I I I I I I I I 1 I M I I I I I I I I I! I I I I i I I I I I I I I I I ! M ) i I I I I I I I I i I i I I I I 
orf4 8a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATA RPIVNLXYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 119 

orf 4 8 . pep ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYQ IMTGL 
Mill 111 It I I I I I I I I I II I M I I I I II I I I I I I I II I I I I I I I I I M I I I I 
o r f 4 8 a ALPWRX VK I XG VLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI XTAPALY QIMTGLL 

70 80 90 100 110 120 

o r f 4 8 a LLYMLAM P FV LQKAAAKT D FRH I AAC AA VWAAG Y FTGH L SX YDRGRMAN I FGANN FY YA 

130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ID 471 > is: 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGAATATTC 
GCCGAAACGG 
TGTTTTGGGT 
TTGGANTACC 
CAAAATTGNC 
TGATGATGGT 
AACCTCGTCC 
CGGGCTGTTA 
CCGCCGCCAA 
GTGGCAGCCG 



ACACCCTGCT 
CTGCTGCTGT 
TTTGGCACTG 
TTCCCGCCGC 
GGCGTATTGG 
GATCCAACTC 
CCTTCATCNT 
CTGCTGTATA 
AACCGACTTC 
GCTATTTTAC 



CTCCAAACAA 
CCCTGCTGAT 
CTGACCGCCA 
GCTGCTGATC 
CGTNTTGGCT 
TTCCCTTTTA 
GACCGCCCCC 
TGCTGGCGAT 
CGACACATTG 
CGGCCATTTG 



TGGACGCTGC 
ACTGCTNNCC 
CCGCCCGCCC 
GCCCTGCCTT 
GGCGGTTTTG 
TGGATCTCAT 
GCCCTTTATC 
GCCGTTTGTG 
CCGCCTGTGC 
AGTTANTACG 



CGCCATTCCT 
CCCAATGCGG 
GATTGTCAAT 
GGCGTNTCGT 
TTTGACGGGC 
CGGCGCCATC 
AG AT AAT G AC 
TTGCAGAAAG 
CGCCGTTGTG 
ACCGGGGGCG 
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10 



15 



501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



GATGGCCAAT 
CGATGCTCTA 
GTCGATCCCG 
TCTGAACGAG 
GGGGGCTGCC 
CTGGCGCAAA 
CATCGGCGCG 
GTTTGCGCGG 
TGCCTCCCCA 
CGGCGCGGGC 
GCTTTCAAGA 
GCCATTTTCG 
ANTTTTCAAA 
GCCACGCCGA 
ACCGAATATG 
GCACACCCAA 
TGAAAGGCAC 
AACCTCAATG 
GAACTTCAAA 



ATCTTCGGCG 
CACCGTCAGC 
TCTTCCTCCC 
CCGAAATCTC 
GGCCAATCCC 
AAGANCGTTT 
ACGATCGAAG 
GTTCGCACTG 
ACCGTTTGAA 
AGTTCGCTTT 
AATCAAAACC 
GCGGCGTGTG 
AAACACGACA 
CTATCCCGAA 
GCCTGCCCGC 
TTCTTCGACC 
GGAAGTCATC 
AAACCTTCCG 
ATCAAATAA 



CAAACAACTT 
CAGAATGCCG 
CTTGGGCAAT 
AAAAAATCCT 
GAACTTCAAA 
TTCGGTTTGG 
GCGAAATGCG 
CGCCGCGCGC 
ACAAGAAGGT 
ACGACCGCTT 
GCCGAAAACC 
CGACAGCGAG 
AGGGACTGTT 
TCNGACATTT 
CGAAACCGAC 
AACTGGCGGA 
ATCGTCGGCG 
CTACCTCAAA 



CTATTACGCC 
ACTTTATTAC 
CAACAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAACTGTGT 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
NTCTGCCGCA 
TTTGATCCAA 
ACCATCCGCC 
CAGGGGCACG 



AAAAGTCAGG 
CGCCGGCCTG 
CCGCCACGCA 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGNCTGGCT 



20 This encodes a protein having amino acid sequence <SEQ ID 472>: 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATARPIVN 



LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FPGLMMVI QL 

NLVPFIXTAP ALYQ IMTGLL LLYMLAMPFV L QKAAAKTDF 

VAAGYFTG HL SXYDRGRMAN I FGANN FY YA KSQAMLYTVS 

VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP 

LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL 

CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT 

AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE 

TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI 

NLNETFRYLK QGHVXWLNFK IK* 



FPFMDLIGAI 
RHIAACAAW 
QNADFITAGL 
ELQNATFAKL 
RRAPDEKFAR 
AENLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



OKF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 



35 



40 
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55 



60 



65 



10 20 30 40 50 60 

orf4 8a.pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARP I VN LXYLPAALLI 

I | } I I I I I I I \ I I I 1 I II I I I I I I 1 I I I I M M I I 1 I I I I I M I I I I I I I i 

orf 4 8-1 MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 8a .pep ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
11)11 Ml (ill I t I I M I I I I I I ! I I I I II I I I I I 1 I I I 1 I I I 1 I I I I I I I I I I 
orf 4 8-1 ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 8a. pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 
M I I I I II I I I I I I II I I I I I I II : I I I I I : I I I I I I I I I I I I I I I i I I II I I I I I I I I 
o r f 4 8 - 1 LLYMLAM P FV LQKAAAKT D FRH I AVC AAVVAAAG Y FTGHL S YYDRGRMAN I FGANN FY Y A 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 4 8a. pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

I M M | | | | | | M I I I I I I I I I 1 I I I I I M II I I II II I I I I I I I I I I I M M I I I II M 
orf 4 8-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 48a .pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 

I I I M I I I II I I I M I II I I I I I I I : I M I I I I II I I I I I I I I I I I I I I I I I M 

orf 48-1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRG FALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 8a. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
I | | i I I I I I I I I I I I I I I I II I I I I I I I II II i M II Ml I II I i I I I I I I I M I I I I I I 
orf 48-1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
• 310 320 330 340 350 360 



BNSDOCID: <WO 9924578A2_L> 
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370 380 390 400 410 420 

or f 4 8a. pep LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 
I M ] I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I 
orf4 8-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
5 370 380 390 400 410 420 

430 440 450 460 470 

orf4 8a.pep FFDQLADLI QRPEMKGTEVI I VGDHPP PVGNLNET FRYLKQGHVXWLN FKI KX 
I I I I I I ! I I I I I I I I ! I I I I I I I II I I I I i I I I I I! I I I I I I 1 I I II II I I I 
10 orf 4 8-1 FFDQLADL I QRPEMKGTEVI I VGDHPPPVGNLNETFRYLKQGHVAWLN FKIKX 

430 440 450 460 470 

Homology with a predicted ORF from N. gonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
15 gonorrhoeae: 

orf 4 8 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

I I I I : I II : I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf4 8ng MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

20 orf 4 8 . pep ALPWRFVKIAGVIAFWIAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

M I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I M I I M I I I II I II 
orf 4 8ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

The ORF48ng nucleotide sequence <SEQ ED 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 

25 I MNIHALLSEO WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWPAVL FDGLMMVI Q L FPFMDLIGAI 
101 NLVPFILTAP APYQ IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG . . 

30 Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 

1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

35 201 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AG AT AAT G AC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

4 01 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 

40 4 51 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc aAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

45 701 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

50 951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT T TACT G GAT G ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT T C AAC C AC AG GCTCAAATGC 

55 1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtcttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

1401 GCACTTCAAA ATCAAATAA 

60 This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 
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10 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNIHALLSEQ 
LDYLPAALLI 
NLVPFILTAP 
AAAGYFTGHL 
VDPVFLPLGN 
LAQKDRFSVW 
CLPNRLKQEG 
AIFGGVCDSE 
TEYGLPAETD 
NLNETFRYLK 



WTLPPFLPKR 
ALPWRFVKIA 
APYQIMTGLL 
SYYDRGRMAN 
QQRAATRLSE 
ESGSFPFIGA 
YATFAMHGAG 
LFGEVSAFFK 
LCRNFSLHTQ 
QGHVAWLHFK 



LLLSLLILLA 
GVLAFWPAVL 
LLYMLAMPFV 
IFGANNFYYA 
PKSQKILFIV 
TVEGEMRELC 
SSLYDRFSWY 
KHDKGLFYWM 
FFDQLADLIR 
IK* 



PNAVFWVLAL 
FDGLMMVIQL 
LQKAAVKTDF 
KSQAMLYTVS 
AESWGLPGNP 
AYGGLRGFAL 
PRAGFQKIKT 
TLTSHADYPE 
RPEMKGTEVI 



LTATARPIVN 
FPFMDLIGAI 
RHIAVCAAW 
QNADFITAGL 
ELQNATFAKL 
RRAPDEKFAR 
AENLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 



15 



20 



10 20 30 40 50 60 

orf 4 8-1 . Dep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
| M | : I I I : I I I I M I I I I II I I I i I M I i I I It I I I I 1 I 1 M I I I I I M I I I I I i I I I I 
orf4 8ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 8-1 pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
M | I | | | | I | I I I II I I I I I I M I I I II I I I I I I I I I I 11 I I M I I I I I I M I I I II M 
orf4 8ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 



130 140 150 160 170 180 

25 orf 4 8-1 pep LLYMLAMPFVLQKAAAKTDFRH I AVCAAVVAAAGYFTGHLSYYDRGRMAN IFGANNFYYA 

I M I I I I I II I I I I ! : M I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I 
o-f 4 8na-l LLYMLAMPFVLQKAAVKTDFRH I AVCAAVVAAAGYFTGHLSYYDRGRMAN IFGANNFYYA 

130 140 150 160 170 180 

30 190 200 210 220 230 240 

orf 48-1 . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
I I I I ! i I I I I I I I I I I ! I I I I I I I I I I I I I I I I I 1 I : I : I I I M I I I I I I I I I I I I I : I I 
o^f 4 8ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 

35 

250 260 270 280 290 300 

o^46-l - pep ELQNAT FAKLLAQKDRFSVWESGS FPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

I I I M I I I I II I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I II I I I i I I I II 
o^f 4 8ng- 1 ELQNAT FAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

40 " 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 8-1 .pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
I || M | I I I I I j I I i I I I I I I I I M I I II I I 1 I M I : I I I II I I I I 1 I I M II M I I I I I 
45 orf 4 8ng-l CLPNRLKQEGYAT FAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 8-1 .pep LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRN FSLHTQ 

50 I I I I I II I I I I I I I M I I I I I 1 I I ! I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I 

orf 4 8ng-l LFGEVSAFFKKHDKGLFYWMTLTSKADYPESDIFNHRLKCTEYGLPAETDLCRN FSLHTQ 

370 380 390 400 410 420 



430 440 450 460 470 

55 orf 4 8-1 .pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

M | M | | I I : I M I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I : I I M I 
orf4 8ng-l FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
60 and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 . . GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

5 101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

10 351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

4 01 TGATCAATAT GTACGCC. . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 



1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPT PWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
15 101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 479>: 



20 



25 



30 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



ATGTCCGAAC 
TCCGGGGATC 
CCTCGACGCA 
ATCCTGACCA 
CACGCTGGAC 
GCGTTTATTT 
AACGCGGGCG 
TCCCTCGCTG 
CCTGCCTGAT 
TCCAAAATCA 
CATCGCTATG 
CACCGTGGAC 
ATGCCCGCGC 
AAAACAACGC 
ACGTCGGTTA 
GGCGCGTTTG 
CAAATATATC 
GGTCGCGCCC 
ACGATTACCG 
CCTGCTGCGC 
ATATTTGGGT 
GTAATGGCGA 
CCCTGTGTTT 
AC AAACT C AC 
CTGACCGGTT 
ATGA 



AACATATTTC 
ATGATGGCTT 
GGCGGGCGCG 
ACCTCTTCAA 
ACGGGCAAGA 
GTGGGTATTC 
CGGTCGCCAT 
ATGTTTGATG 
TATTTTGGTG 
TCATCGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATTGAAAT 
ATCAATCCTT 
TATCGCCAGT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TCGTGGACGG 
GGAAAAGACA 
GGCGGGCAGC 
ATCTGCTCAA 
GCCTGGCTGA 
AT CAGGTATG 
TTACCGTTTT 



GACTTGGAAA 
CGGCGGCGGT 
CTTTACGGCT 
ATACCCGTTT 
GCCTGATTGA 
CTGATTTTGT 
TGTAACCGCC 
CCGGCACGGT 
AGCGGACGTT 
TTTGAGTATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CCGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TCAATATGTA 
TTTATCGCGT 
CTATGCCCGT 
AAACGGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
ATTACCGTTT 
AATGCCCTTG 
GTTCTTATTG 



AGTAAAATCA 
CGGCGGTTCG 
GGCAGATCGC 
TTCCGCTTCA 
AGGTTATGCC 
GCATCCTCTC 
GCCATCGTCA 
TGCCGCCTTG 
ACCGCGCTTT 
GCCACGCTTG 
GTCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
CGCCGTTACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAATTC 
TGATTTTCTG 
ATTGCCGCTT 
GGTTAAAGGT 
CATTGGCAGG 
AATTTGGCGG 



ACGCATTGGG 
C AC CT GAT T G 
GCTC AT CAT C 
GCGCGCATTA 
GAGAAAAGCC 
CGCCACGATT 
AAATGGCGAT 
ATTATGGCAT 
GGATCGCGTT 
CCGCCGCCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTTGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTTGCCTGGA 
GTTTGACGGC 
TTGTGTCCGC 
GATGAAAAAC 
CTTGATTTAT 
GAATGTTCAA 



This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIA M SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWN I WV AGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

4 01 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
55 ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of TV. 

meningitidis: 



45 

50 
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orf 53 .pep 
orf 53a 



orf 53. pep 
orf 53a 



orf 53 .pep 
orf 53a 



orf53a 



10 20 30 

V SGRYRALDRVSK I I IVTLS I ATLAAAGIA 
Fl ! I I I I I M I I M M I M I I I I I I M I I I 
AA I VKMAI PSL MFD AGTVAALIMASCLI ILV SGRYRALDRVSK I I IVTLS I ATLAAAGIA 
110 120 130 140 150 160 

40 50 60 70 80 90 

M^RGMQMOSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 

| | | | | M II I I M I I I I I I M I I II I I I I I I II I I I I I I I II M I I I I I I I II I 

MSRGMOMOSDFIEPT PW TLAGLGFLIALMGWMPA PIEI SAINSLWVTEKQRINPSEYRDG 
170 180 190 ~ 200 210 220 

100 110 120 130 139 

IFEFNVGY IASAVLALVFLALGXV APNGNGXTVQMAGGKYNGQLINMYA 

! I : I I I I I M I 1 I I I I I !! I I I : Ml :l I II I 111 MINIM 
IFDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLV 

230 240 250 260 270 280 

AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300" 310 320 330 340 



The complete length ORF53a nucleotide sequence <SEQ ID 481 > is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
' 951. 
1001 
1051 
1101 
1151 
1201 
1251 



ATGTCCGAAC 
ACCGGGGATT 
CCTCGACGCA 
ATCCTGACCA 
CACGCTGGAC 
GCGTTTATTT 
AACGCGGGCG 
TCCCTCGCTG 
CCTGCCTGAT 
TCCAAAATCA 
CATCGCTATG 
CACCGTGGAC 
ATGCCCGCGC 
AAAACAACGC 
ACGTCGGTTA 
GGCGCGTTTG 
CAAATATATC 
GGTCGCGCCC 
ACGATTACCG 
CCTGCTGCGC 
ATATTTGGGT 
GTAATGGCGA 
CCCTGTGTTT 
• ACAAACTCAC 
CTGACCGGTT 
ATGA 



AACATATTTC 
ATGATGGCTT 
GGCGGGCGCG 
ACCTCTTCAA 
ACGGGCAAGA 
GTGGGTATTC 
CGGTCGCCAT 
ATGTTTGATG 
TATTTTGGTG 
TCATCGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATTGAAAT 
ATCAATCCTT 
TATCGCCAGT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TTGTGGACGG 
GGAAAAGACA 
GGCGGGCAGC 
ATCTGCTCAA 
GCCTGGCTGA 
ATCAGGTATG 
TTACCGTTTT 



GACTTGGAAA 
CGGCGGCGGT 
CTTTACGGCT 
ATACCCGTTT 
GCCTGATTGA 
CTGATTTTGT 
TGTAACCGCC 
CCGGCACGGT 
AGCGGACGTT 
TTTGAGTATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CCGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TCAATATGTA 
TTTATCGCGT 
CTATGCCCGT 
AAACGGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
•ATTACCGTTT 
AATGCCCTTG 
GTTCTTATTG 



AGTAAAATCA 
CGGCGGTTCG 
GGCAGATCGC 
TTCCGCTTCA 
AGGTTATGCC 
GCATCCTCTC 
GCCATCGTCA 
TGCCGCCTTG 
ACCGCGCTTT 
GCCACGCTTG 
GTCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
CGCCGTTACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAATTC 
TGATTTTCTG 
ATTGCCGCTT 
GGTCAAAGGT 
CATTGGCAGG 
AATTTGGCGG 



ACGCATTGGG 
CACCTGATTG 
GCTCATCATC 
GCGCGCATTA 
GAGAAAAGCC 
CGCCACGATT 
AAATGGCGAT 
ATTATGGCAT 
GGATCGCGTT 
CCGCCGCCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTTGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTTGCCTGGA 
GTTTGACGGC 
TTGTGTCCGC 
GATGAAAAAC 
•CTTGATTTAT 
GAATGTTCAA 



This encodes a protein having amino acid sequence <SEQ ID 482>: 



1 MSEQHISTWK SKINALGPGI 

51 . ILTNLF KYPF FRFSAHYTLD 

101 NAGAV AIVTA AI VKMAI PSL 

151 SK IIIVTLSI ATLAAAGIA M 

201 MPA PIEISAI NSLWVTEKQR 

251 GAFVQYGNGE AVQMAGGKYI 

301 TITW DGYAR AIAEPVRLLR 

351 VMAN LLKFAM IAAFVSAPVP 

4 01 LTGFTVLFLL NLAGMFK* 



MMASAAVGGS HLIASTQAG A LYGWQIALII 
TGKSLIEGYA EKSRVYLW VF LILCILSATI 
MFDAGTVAAL IMASCLIILV SGRYRALDRV 



SRGMQMQSDF 
INPSEYRDGI 
GQLINMYAVT 
GKDKTGNAEF_ 
AWLNYRLVKG 



I E PT PW TLAG LGFLIALMGW 

FDFNVGY IAS AVLALVFLAL 

IGGWSRPL VA FIAFACMYGT 

FAWNIWVAGS GLAVIF WFDG 

DEKHKLTSGM NALALAGLIY 



ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1 : 



orf 53a . 



orf53-l 



10 20 30 40 50 60 

pep MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALII ILTNLFKYPF 
| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I II M I ! II I I I I I I I I I I I I I I I I t I I 
MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 
10 20 30 40 50 60 



70 



80 



90 



100 



110 



120 
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10 



15 



20 



25 



30 



35 



orf 53a . pep frfsahytldtgksliegyaeksrvylwvflilcilsatinagavaivtaaivkmaipsl 

1 * I I 1 I I I I ] I I I I 1 I I t 1 I I I I I 1 1 I J I I I I 1 I 1 1 I I I I J I i 1 I I I i i M I M I I 

orf 53-1 FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 53a pep MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 
I | | | | M I I I I I M I M I I I I I I t I I I II I I I II M I I I I I I I I I I I II I I I I I i I I I I I 
orf 53-1 MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 53a pep IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 
| M | I | | | M I I I I I I I I I I I I I I M I I I II I I I M I I I I I I I It I I M I I I I I I I I I I I 
orf 53-1 IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 

250 260 270 280 290 300 

o r f 5 3 a pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVT I GGWSRPLVAFI AFACMYGT 
M | | | M I I II I I M I M M 1 I I I I I M I I I 1 II I I I M I I I I II I I II I I I I II I I II I 
orf 53-1 AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 53a pep TITVVDGYAPLAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
I I I I I I I II I I I II I I I I I I M I I II M M I II I I I I M I I I M I I ! I I I M I t I I I I It 
orf 53-1 TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 

310 320 330 340 350 360 

370 380 390 400 410 

orf 53a . pep lAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNAI^LAGLIYLTGFTVLFLLNlAGMFKX 
I M I | I II I I I I II I I M M I I I I I I I II I i I II I II I I I I M II II I I I I I I I II I I 
orf 53-1 IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLI YLTGFTVLFLLNLAGMFKX 

370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 



gonorrhoeae: 



40 



45 



50 



orf 53 -pep 
orf 53ng 
orf53 .pep 
orf 53ng 
orf 53 .pep 
orf 53ng 



VSGRYRALDRVSKI I IVTLSIATLAAAGIA 30 
I I II I I I II I I I I II I I I II II I I M I I I I 

AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKI I IVTLSIATLAAAGIA 91 

MSRGMQMQSDFIEPTPWTLAGLG FLIALMGWMPAPI EI SAINS LWVTEKQRINPSEYRDG 90 
MINIM II I I II I I I I I I ! M I I M M I I I M M M M I M M M M M M M I I 1 I 

MSRGMQMQPDFIEPTPWTLAGLG FLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 139 
I | :\ M II I I I M I M M I M I : ill : II I : I I I I MINIM 

IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 



55 



60 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 

1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVS GRYRALD RVSK IIIVTL SIATLAAAGI A MSRGMQMQP 

101 DFIEPT PW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I AS AVLALV FL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 

1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 
101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 
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151 ATTATGGCAT CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT 

201 GGATCGTGTT TCCAAAATCA TCATTGTTAC TTTGAGCATC GCCACGCTTG 

251 CCGCCGCCGG CATCGCTATG TCGCGCGGTA TGCAGATGCA GCCCGATTTT 

301 ATCGAGCCGA CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT 

351 GATGGGCTGG ATGCCCGCGC CGATCGAAAT TTCCGCCATC AATTCTTTGT 

4 01 GGGTAACCGA AAAACAACGC ATCAATCCTT CTGAATACCG CGACGGGATT 

4 51 TTCGATTTCA ACGTCGGTTA TATCGCcagT GCGGTTTTGG CTTTGGTTTT 

501 CCTTGCACTG GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA 

551 TGGCGGGCGG C AAAT AT AT C GGGCAATTGA TTAATATGTA TGCCGTAACC 

601 ATCGGCGGCT GGTCTCGTCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT 

651 GTACGGCACG ACGATTACCG TTGTGGACGG TTATGCGCGT GCCATTGCCG 

•7 01 AACCCGTGCG CCTGCTGCGC GGCAGGGATA AAACCGGCAA CGCCGAGTTG 

7 51 TTtgccTGGA ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG 

8 01 GTTTGACggc gcaaTGGCgG AACtgcTCAA ATTTGCGATG ATtgccgcCT 
851 TTGTGTCCGC CCCTGTGTTC GCCTGGCTCA ACTACCGCCT CGTCAAAGGG 
901 GACAAACGCC ACAGGCTTAC CGCCGGTATG AACGCCCTTG CCATTGTCGG 

9 51 CCTGCTCTAC CTGGCCGGGT TTGCCGTTTT GTTCCTGTTG AACCTTACCG 
1001 GACTTTTGGC AT AG 

This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 

1 . .KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV SK IIIVTLSI ATLAAAGIAM SRGMQMQPDF 

101 IEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI GQLTNMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

2 51 FAWN I WV AGS GLAVIF WFDG AMAS LLKFAM IAAFVSAPVF A WLNYRLVKG 

301 DKRHRLTAGM NA LAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 

60 70 80 90 100 110 

53-1 oeD ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 

: I I I I I I I I I I I II II I I I I I 1 I I I M I 
orf53nq-i KKSCVYLWVFL I LCI AS AT I NAGAVAIVTA 

10 20 30 

120 130 140 150 160 170 

orf53-j pep AIVKMAIPSLMFDAGTVAALIMASCLI I LVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 
f ! I II I I I I II I I I I I I I I I M I 1 I I I I II M I M I I I I I I I I I I I I I I I I I I I I I I I i I 
orf53nq-l AIVKMAIPSLMFDAGTVAALIMASCLI ILVSGRYRALDRVSKI I IVTLS IATLAAAGIAM 

40 50 60 70 80 90 



180 190 200 210 220 230 

orf 53-1 . pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I I | | | | | | I I I I I I I I I II I I I I I M I I I I I I I I I II I I I I 11 I I I I I I i I I I I I M I I 
orf53ng-l ' SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

100 110 120 130 140 150 

240 250 260 270 280 290 

orf 53-1 pep FD FN VGYI AS AVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVA 
| | I I I I I I I I I I I I I M I I I M I I I I M I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I 
orf53ng-l FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVA 

160 170 180 190 200 210 

300 310 320 330 340 350 

orf 53-1 .pep FI AFACMYGTT I TWDGYARAI AE PVRLLRGKDKTGNAE FFAWN I WVAGSGLAVI FWFDG 
I M I I M I I I I I I I I II I I I II I I I I I II 1 I : I I M I II : I I I I I I I I I I I 1 I I I I 1 I I I 
orf53ng-l F I AFACM YGTT I T WDG Y ARA I AEPVRLLRGRDKTGNAEL FAWN I WVAGSGLAVI FWFDG 

220 230 240 250 260 270 

360 370 380 390 400 410 

orf 53-1 . pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 
: M : I I I I I I I II I I I I I I I I I I I I I I II I I I : M : M I I I I :: I I : I I : I I : I I I I I 
orf53ng-l AMAELLKFAMIAAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLYLAGFAVLFLL 

280 290 300 310 320 330 



orf 53-1 . pep 
orf53ng-l 



NLAGMFKX 

I I : I : : 
NLTGLLAX 
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Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be 
5 useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 58 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 487>: 

1 . . TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 
51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 
10 101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 
151 CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 
201 GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 
2 51 TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 
301 GTTCCGCCT . . 

15 This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GLFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FH AVKTAVYWLF VGW RFCRNY LAHESEPDRP 

101 VPP. . ~ 

Further work revealed the complete nucleotide sequence <SEQ ED 489>: 

20 1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

25 251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

30 501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

35 751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCG AT G CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

40 1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

45 1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

14 51 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

50 1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

55 17 51 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
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2051 
2101 
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2201 
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TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGA 
GCAAATCGGT 
GCGCCGGAAG 
GAG CAT TT AC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATTTGAT 
CTGATTAAGG 
AATCGACAGC 
GTCAGGGCGA 
GTTCACGGCG 
TTTGAAACAG 
GCGGCAGCGA 
GATCCGATGT 
CAGCATTTCG 
CGCGTCTGAT 
CACAACGGCA 



CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACGAT 
TCGATGAGTT 
CTGATTGCCC 
TCTTGCCACA 
CGAACATCCC 
CGC AC GATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
AGAGCTGCCC 
ACGACGAGGC 
GGCGTACAGC 
TGACCAGATG 
ACCGTACGAT 



AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
G ATT AT GAT C 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TCGACCAAAT 
CTGCTGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGCATCGGGC 
CGTATCCGTT 
GCGCCTTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGTAATCTTG 
AGAAAAAATC 
TGGAAAAACT 
ATGATGACGG 
AAAAGCCCGC 
GCGTCGATGT 
GCGTTCCAAG 
GGGCGCGGAA 
GTACTGCCTA 
GTGCACCGCG 
TGACGATATT 
GCAGCGGCGA 
GTCCTGAAAA 
TATCGGCTAC 
GCATTGTGTC 
TTGGACAATG 



ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGCTTCAA 
GGCAATCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGACGAAACC 
CGCGCAAAGC 
AACCGCGCCG 
CGCACCGGAA 
CTTGA 



This corresponds to the amino acid sequence <SEQ ID 490; ORF58-1- 
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30 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
.501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



MFWIVLIVIL LliALAGLFFV RAQSEREWMR 



DGMPDFPELA 
ANRADVPTAS 
IPFDRSIAEG 
DAFEKNETAV 
SADYGFEPYF 
QGQSVSDGTA 
DVEMPSETEN 
PKVPMTAIDI 
GGWQEETAAI 
PSCRVSDTEA 
ENSITIEEKL 
LARS LGVAS I 
KLTLALGQDI 
APEDVRMIMI 
RYRLMSFMGV 
WW DE FA PL 
LIKAN1PTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



LML FHAVKT A VYWLFVGWR 



DGYSDSGNGT 
LMPSESEISP 
PKVRVSDTPM 
EKQHPSAFSA 
VRDARRRVSV 
VFTETVSSVG 
QPPPPVSEIY 
ADDGSEGAAE 
DEGAFPSEET 
AEFKVKVKW 
RWETIPGKT 
TGQPVVTDLG 
DPKMLELSIY 
RNLAGFNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
LDNA* 



EEAETEEAEA 
VRPVFKEITL 
EGLQIIGLDD 
VKAENARNAP 
NLKEPNKATV 
YGGPVYDETA 
NRTYEPPSGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDI 
GVQRALRIGY 



EVSAWQEKKG 
FCRNYLAHES 
AEEEAADTED 
EEATRALNSA 
PVLQRTYSHM 
FHRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDD 
AAGIHLILAT 
NLLGQGDMLF 
LSGGGSEELP 
NRAARLIDQM 



EKQAELPEIK 
EPDRPVPPAS 
IATAVIDNRR 
ALRETKKRYI 
FDADKEAFSE 
QAEAKSPDVS 
PESQTWGKR 
DAWWEPPEV 
TDHLADDVLN 
CPFENVPSER 
EATQTEEELL 
GNSVLNLEKD 
NSPEFAESKS 
AMILSMLFKA 
LNWCVNEMEK 
PEPLEKLPFI 
QRPSVDVITG 
LLPGTAYPQR 
GIGRSGDDET 
EAEGIVSAPE 



50 



55 



60 



Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of N. 
meningitidis: 

10 • 20 30 40 50 60 

orf 58 .pep LRETAYVLDS FDRY FW ALAGLFFVRAQS EREWMREVS AWQEKKGEKQAELPE I KDGMPD 

: : : I I I I II 1 I I I 1 M I I I I i M I I I I I I I I I I I I I I I I I I I I I I I 
orf 58a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

10 20 • 30 40 50 

70 80 90 100 

or f 58 . pep FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPP 
I I I I II I I I II 1 I I I I I I I I I I i I It I I I I I I II I II I I I I I I 
orf 58a FPEIALM LFHAVKTAVYVJLFVGW RFCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
60 70 . 80 90 100 110 
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The complete length ORF58a nucleotide sequence <SEQ ID 491> is: 
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60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



ATGTTTTGGA 
GTTTTTTGTC 
CGTGGCAGGA 
GACGGTATGC 
CAAAACGGCA 
ACTATCTGGC 
GCAAATCGTG 
AAACGGGACG 
AGGCTGCCGA 
ATCCCATTCG 
AATTTCGCCC 
CGCGTGCTTT 
GATGCATTTG 
TACCCCGATG 
AACGCACGTA 
TCTGCGGATT 
CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGGATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAACC 
CCGAAAGTTC 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGTGGCAGAG 
GGCATGACAG 
CCGTCCCGCC 
TGAAGAAACC 
TGCCGCCGCT 
GANAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 
TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGC 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATCTTAT 
CTGATTAAGG 
AATCGACAGC 
GGCAGGGCGA 
GTTCACGGCG 
TCTGAAACAG 
GTATGTCCGA 
GATCCGATGT 
CAGCATTTCT 
CGCGTCTGAT 
CACAACGGCA 



TAGTTTTGAT 
CGCGC ACAAT 
AAAGAAAGGG 
CCGATTTTCC 
GTGTATTGGC 
GCACGAATCC 
CGGATGTTCC 
GAAGAAGCGG 
TACGGAAGAC 
AC C G GAG TAT 
GTCCGTCCGG 
AAACAGCGCG 
AGAAAAACGA 
GAAGGGCTGC 
TTCCCGTATG 
ACGGATTTGA 
GTCAAAGCCG 
GGGNAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
GCGCCTGATT 
TGCCGTCTGA 
TACGGCGNTC 
TGCCGCGCCC 
CCATGCCCGC 
AACCGTACCT 
CATTGCCGAA 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGCATNGGA 
GGTGCGGT AT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGTATGGGTT 
CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCG ACAAT 
TTGATGAGTT 
CTGATTGCCC 
CCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
CGATTTGCTG 
ACGACGAGGC 
GGCGTGCAGC 
TGACCAGATG 
ACCGTACGAT 



CGTTATTTTG 
CCGAACGCGA 
GAAAAACAGG 
CGAACTTGCC 
TGTTTGTCGG 
GAACCGGACA 
GACCGCATCC 
AAACGGAAGA 
ATTGCAACTG 
TGCTGAAGGG 
TTTTTAAGGA 
GCTTTAAGGG 
AACAGCGGTC 
AGATTATCGG 
TTCGATGCGG 
GCCGTATTTT 
AAAATGCACG 
CAGGCGGAGG 
CGGCACAGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATNGATATT 
ATGAACCGCC 
ACCGATCATC 
CGCCGCTATT 
GGCAATATTT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GGGGCGACGC 
AG AAAAAT N G 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 
AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TTGACCAAAT 
CTGCCGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGAATCAGCC 
CGTGTCNGTT 
GCGCATTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



TTGCTTGCGC 
GTGGATGCGC 
CGGAGCTGCC 
CTGATGCTTT 
TGTCGTCCGT 
GGCCCGTTCC 
GACGGATATT 
AGCAGAAGCT 
CCGTAATCGA 
TTGATGCCGT 
AATCACTTTG 
AAACGAAAAA 
CCCAAAGTCC 
TTTGGACGAC 
ACAAAGAAGC 
GAGAAGCAGC 
GAATGCGCCG 
CNAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
GGACGGTTGT 
GTTTTCACGG 
TGAAACTGCC 
TGGTCGAACC 
CCGCCGCCGC 
GGCAGGATTC 
TTGCCGATGA 
GCGAATGACG 
GTCGGAAACC 
AAAATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
GATTACGCGT 
TTCTAAATCT 
CGCGTTGTCG 
GAACCCGAAA 
AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGCAATCTTG 
GGAGAAAATC 
TGGANAAATT 
ATGATGACGG 
AAAAGCCCGC 
GTGTCGATGT 
GCGTTCCAAG 
GGGTGCGGAA 
GTACGGCCTA 
GTGCACCGCG 
TGACGATATN 
GGAGCGGCGA 
GTTTTGAAAA 
TATCGGCTAT 
GCATTGTGTC 
TTNGACAATG 



TTGCCGGCTT 
GAGGTTTCTG 
TGAAAT C AAA 
TCCATGCCGT 
TTCTGCCGAA 
GCCTGCTTCT 
CAGACAGTGG 
GCGGAGGAAG 
CAACCGCCGC 
CTGAAAGCGA 
GAAGAAGCAA 
ACGCTATATC 
GCGTGTC.CGA 
CCTGTGCTTC 
GTTTTCCGAG 
ATCCGTCTGC 
TTCCGCCGTC 
GGATGTTTCC 
CCNGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAANTGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGTGC 
TGTTTTGAAT 
GCAGTGAGGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCAATC 
GACCTGCTTC 
AGANCTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAN 
AAACCATCCT 
CGCCAAATGA 
ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
G TTACCG AT A 
AATGGAAAAA 
CGGGTNTCAA 
GGCAACCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGGCGAAACC 
CGCGCAAAGC 
AATCGCGCCG 
CGCACCGGAA 
CTTGA 



65 



This encodes a protein having amino acid sequence <SEQ ED 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM L FHAVKT A VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 • IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
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10 



15 



251 SADYGFEPYF 

301 QGQSVSDGTA 

351 DVEMPSETEN 

401 PKVPMPAXDI 

451 GGWQEETAAI 

501 PSRRAXDTEA 

551 XNSITIEEKX 

601 LARSLGVASI 

651 KLTLALGQDI 

701 APEDVRMIMI 

751 RYRLMSFMGV 

801 WWDEFADL 

851 LIKANIPTRI 

901 VHGAFASDEE 

951 DPMYDEAVSV 

1001 HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNAP 
NLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VT DMKIiAAN A 
GNPFS LTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



QAEAKSPDVS 

PESRTWGKR 

wDAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

GATQTEEXLL 

GNSVLNLEKX 

NSPEFAESKS 

AMILSMLFKA 

LNWCVNEMEK 

PEPLXKLPFI 

QRPSVDVITG 

LPPGTAYPQR 

GISRSGDGET 

EAEGIVSAPE 
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orf58a.pep 
orfSS-1 

orf 58a . pep 
orfS8-l 

orf 58a . pep 
orf 58-1 

orf 58a . pep 
orf56-l 

orf 58a . pep 
orf58-l 

orf 58a - pep 
orf 58-1 

orf 58a . pep 
orf58-l 

orf 58a . pep ' 
orf58-l 

orf 58a . pep 
orf58-l 



10 20 30 40 50 60 

MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

II I I I M I I M I I I I I I I I I I I I I I I I i I I I i I I I I II II f I I I I I I II I I I I I I I M I I 
MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
10 20 30 40 50 60 

70 80 90 100 110 120 

LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTAS DGYSDSGNGT 

I I I |.| I I I I I M I I I II I I I I I I I II I I M I I I I I II I I I I M I I I I I I I I I I I I II I M 
LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
70 80 90 100 110 120 

130 140 150 160 170 180 

EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

M | ] I I 1 I I I I I I I I I II t I I I I I I II I I II I I I I I I I I M I ! I I i I I I I I I I I I I I I I I 
EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
130 140 150 160 170 180 

190 200 210 . 220 230 240 

EEATRALN SAALRETKKRYIDAFEKNETAVPKVRVSDT PMEGLQIIGLDDPVLQRTYSRM 
I | | | | | | | 1 1 I I I I I I I M I I I M I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I M : I 
EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQI IGLDDPVLQRTYSHM 
190 200 210 220 230 240 

250 260 270 280 290 300 

FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 

I M I M I I I I I I I I I 1 1 I 1 M I I I I I t I I I I I t I I I I I I I I : M I I I I I ! 11 I M I II II 
FDADKEAFSESADYGFE PYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
250 260 270 280 290 300 

310 320 330 340 350 360 

QGQSVS DGTAVRDAXRRVSVNLKE PNKATVSAEARISRLI PES RTWGKR DVEMPSETEN 
I I M I I I I II I M I I I I I I I I I I I I I M I I I M 1 I I I M I M : I I I I I I I I M I I I I I I 
QG QS V S DG T AVR DARRR V S VN LKE PN KAT V SAEAR I SRL I PESQTWGKR DVEMPSETEN 

310 320 330 340 350 360 

370 380 390 400 410 420 

VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDIPPPPPVSEIY 

1111:1111111 I I I 1 II Tl I I I I II I I II I I I I I 1 I I I I I I II I II MINIMI 
VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
370 380 390. 400 410 420 

.430 440 450 460 470 480 

NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 
M II I I I : I I I M I I I I II I I I I I I II M I I I I I I I M II I : I M I I : I I I I M M I I I I 
NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 .520 530 540 

EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
M M I I I I I I I M N I I I I I I I I : II I N I I N I I M I I I I I I I I I I I I I M I M I I 
EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 " 520 530 540 
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550 560 570 580 590 600 

orf58a pep GATQTEEXLLXNSITIEEKXAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

| | | | | | || MINIM I I M M I M I M M M M M M M M M M M M M M I 
orf 58-1 EATQTEEELLENSITIEEKLAE FKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf58a pep LARSLGVASIRWETILGKTCMGLELPNPiCRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
M I M M M M M M 1 I M M I M M M I M M I M M M I M M M M I M M I M M 
orf58-l LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf58a pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
" ' I I I 1 | I I I I I I I i ! I I I 1 I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I M M I I I 

orf 58-1 TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAbilLSMLFKAAPEDVRMIMIDPKMLELSIY 

670 680 690 700 710 720 

730 740 750 760 770 780 

orf58a pep EG I PHLLAP WTDMKLAANALNWCVNEMEKRYRLMS FMGVRNLAGXNQKI AEAAARGEKI 
I M M II I I I M I I M I M II I I I II I I I ! I II I M I II I M II I I I M I II I I I I I II 
orf 58-1 EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 

790 800 810 820 830 840 

orf 56a pep GNPFSLTPDNPEPLXKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I || I I I II I :• I I M I I M I M I i I I M I I I II I I M M M I I II I M M I II II I I I I I 
orf 58- 1 GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58a pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

I M I I I M I M M II M M I 1 I II I M I I I I I! II I I I I I M I M II M II I M 11 I II 
orf 58-1 QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

orf 58a pep VHGAFASDEEVHRWEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 
M M M II I II I M M I M II M I I II I I I I 11 MM I M II It II I M II I I I M 
or f58-l VHGAFAS DEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

orf58a.pep VLKTRKAS I SGVQRALRIG YNRAARLI DQMEAEG I VSAPEHNGNRT I LVPXDN AX 

II II M M I 11 M I M II I M I M I II M I M I II M M I I II M M I II I II I 
o r f 5 8 - 1 VLKTRKAS I S GVQRALRIGYNRAARL I DQMEAEG I VSAPEHNGNRT I LV PLDNAX 

970 980 990 1000 1010 

Homology with a predicted ORF from N. gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from N. 
gonorrhoeae: 

orf 56 . peD ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 

I M II I I II 

orf58ng SEPDRPVPPASANRADVPTAS DGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ED 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 

1 . . SSPDKPVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

101 AALRETKKRY I DAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 • GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

'301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 
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351 ETDHLAADVL NGGWQEETAA IADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP KRQMIRLSEI 

5 551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLVA G TTG SGKS VGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK IAEAAARGEK IGNPFSLTPD 

7 01 DPEPLE KLPF IWWDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR IAFQVSSKID SRTILDQMGA ENLLGQGDML 

10 801 FLPPGTAYPQ RVHGAFASDE EVHRWEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKAS I SGVQRALRIG YNRAARL I DQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 



This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
15 homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

ORF58ng : 4 67 IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 52 6 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
FtsK* 8 68 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

20 

ORF58ng : 527 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
FtsK: 928 I PGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

25 ORF58ng: 567 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 64 6 

LVAGTTGSGKSVGVNAMI LSML+KA PEDVR IMIDPKMLEL.S+YEGI HLL WTDMK 
FtsK: 98 8 LVAGTTGSGKSVGVNAMI LSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 104 7 

ORF58ng : 64 7 LA^ALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 704 
30 AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

FtsK: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPI PDPYWKPGDSMDAQH 1107 

ORF58ng : 705 --LEKLPFIWVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 
L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
35 FtsK: 1108 PVLKKEPYIWLVDEFADLMMTVGKK\ 7 EELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

ORF58ng : 763 IKANI PTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFAS DEEV 822 

IKANI PTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
FtsK: 1168 I KAN I PTRI AFTVSS KI DSRT I LDQAGAESLLGMGDMLYSG PNSTLPVRVHGAFVRDQEV 1227 

40 

ORF58ng : 823 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
FtsK: 1228 HAVVQDWKARGRPQYVDG ITS DS E S EGGAG-GFDGAEELDPLFDQAVQFVTEKRKAS I SG 1286 

45 ORF58ng: 883 VQRALR I G YNRAARLI DQMEAEG I VS APEHNGNRT I LVP 921 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
FtsK: 1287 VQRQFRIGYNRAARI IEQMEAQGIVSEQGHNGNREVLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

50 51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

55 301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

60 551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AG AAAAAC G G AACAGCCGTC CCCAAAGTAC- GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGGATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAGCC 
CCGGAGGTAG 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGCGGCAGAG 
GGCATGACAG 
CCGTCCTGCC 
GGAAGAGACC 
TGCCTCCGCT 
GAAAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 
TACGCCTGAG 
AAGCTGACGC 
CGACTTGGGC 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
CCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCACCTTAT 
CTGATTAAGG 
AATCGACAGC 
GTCAGGGCGA 
GTTCACGGCG 
TCTGAAGCAG 
GCGGCAGCGA 
GATCCGATGT 
CAGCATTTCG 
CGCGTCTGAT 
CACAACGGCA 



GTCAAAGCCG 
GGAGAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
GCGCCTGATT 
TGCCGTCTGA 
TACGGCGGTC 
TGCCGCGCCC 
CCGTACCCGA 
AACCGTACCT 
CATTGCCGAA 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGTATCGGA 
GGTGCGGTAT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGCATGGGTT 
CGAAATTTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCA 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACGAT 
TCGATGAGTT 
CTGATTGCGC 
CCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAGC 
AGAGCTGCCC 
ACGACGAGGC 
GGCGTACAGC 
TGACCAAATG 
ACCGTACGAT 



AAAATGCACG 
CAGGCGGAGG 
CGGCACAGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATCGATATT 
ATGAGCCGCC 
ACCGACCATC 
CGCCGCTATT 
GGCAATATCT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GAGGCGACGC 
AGAAAAATTG 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 
AATTCGCCCG 
TCAGGACATT 
ATTTGCTGGT 
GCG ATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGATTTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TCGACCAAAT 
CTGCCGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGCATCGGGC 
CGTATCCGTT 
GCGCCTTGCG 
GAAGCGGAAG 
TCTCGTCCCC 



GAATGCGCCG 
CAAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
GGACGGTTGT 
GTTTTCACGG 
TGAAGCTGCC 
TGGTCGAACC 
CTGCCGCCGC 
GGCAGGATTC 
TTGCCGCTGA 
GCAGATGACG 
GTCGGAAACC 
AAGATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
GATTACGCGT 
TTCTGAATTT 
CGCGTTGTCG 
GAACCCGAAA 
AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGCAATCTTG 
AGAAAAAATC 
TGGAAAAACT 
ATGATGACGG 
AAAAGCCCGC 
GCGTCGATGT 
GCGTTCCAAG 
GGGCGCGGAA 
GTACTGCCTA 
GTGCACCGCG 
TGACGATATT 
GCAGCGGCGA 
GTCCTGAAAA 
CATCGGCTAC 
GCATTGTGTC 
TTGGACAATG 



TTCCGCCGTC 
GGATGTTTCC 
CCCGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAACCGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGCGC 
TGTTTTGAAT 
GCAGTGAGGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCAATC 
GACCTGCTTC 
AGAACTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAC 
AAACCATCCC 
CGCCAAATGA 
ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGCTTCAA 
GGCAATCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGGCGAAACC 
CGCGCAAAGC 
AACCGCGCCG 
CGCACCGGAA 
CTTGA 



This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-K 
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60 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



MFWIVLIVIV LLALAGLFFV RAQSEREWMR 



DGMPDFPEFS 
ANRADVPTAS 
IPFDRSIAEG 
DAFEKNGTAV 
SADYGFEPYF 
OGQSVSDGTA 
DVEMPSETEN 
PEVAVPEIDI 
GGWQEETAAI 
PSCRVSDTEA 
ENSITIEEKL 
LARSLGVASI 
KLTLALGQDI 
APEDVRMIMI 
RYRLMSFMGV 
WWDEFADL 
LIKANIPTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



LMLFHAVKTA VYWLFVGWR 



DGYSDSGNGT 
LMQSESKTSP 
PKVRVSDTPM 
EKQHPSAFSA 
VRDARRRVSV 
VFTETVSSVG 
LPPPPVSEIY 
ADDGSEGAAE 
DEGAFQSEET 
AEFKVKVKW 
RWETIPGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGFNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKAS I S 
LDNA* 



EEAETEAAEA 
VRPVFKEITL 
EGLQIIGLDD 
VKAENARNAP 
NLKEPNKATV 
YGGPVYDEAA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEKLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGITHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDI 
GVQRALRIGY 



EVSAWQEKKG 
FCRNYLAHES 
AEEEAADTED 
EEATRALSSA 
PVLQRTYSRM 
FRRHAGQEKG 
SAEARISRLI 
DIHIEEPAAP 
EQAQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VT DMKLAAN A 
GNPFSLTPDD 
AAGIHLILAT 
NLLGQGDMLF 
LSGGGSEELP 
NRAARLIDQM 



EKQAELPEIK 
EPDKPVPPAS 
IATAVIDNRR 
ALRETKKRYI 
FDADKEAFSE 
QAEAKSPDVS 
PESRTWGKR 
DAWWEPPEV 
TDHLAADVLN 
CPFEDVPSER 
EATQTEEELL 
GNSVLNLEKD 
NSPEFAESKS 
AMILSMLFKA 
LNWCVNEMEK 
PEPLEKLPFI 
QRPSVDVITG 
LPPGTAYPQR 
GIGRSGDGET 
EAEGIVSAPE 



ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

orf 58-1 . pep MFWIVLIVILLIJUuAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
M I I I I M I : I t I I I I t 1 I II I I ! I I I I I I I I I I M I M I I I 1 I I M I I I 1 I I 1 I I M : : 
orf 58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 

10 20 30 4 0 50 60 



70 80 90 100 110 120 

orf 58-1 . pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
I I I I I I I I I I I I M I I I I I I I I I M II I I I I I I I I M I I I I I I I M I I I I I I I M I I I I I 
orf58ng-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 58-1 . pep EEAETEEAEAAEEEAADTEDIATAVI DNRRI PFDRSI AEGLMPSESEI S PVRPVFKEITL 
lllllt I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mi: II I II I I I M II 
orf58ng-l EEAETEAAEAAEEEAADTEDIATAVI DNRRI PFDRS I AEGLMQSESKTS PVRPVFKEITL 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 58-1 . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 
I I I I I I I : I II II I I I I I I I I I I I M I I I I I I I I I I I 1 I I I I I I I I I I I I II I I I II : I 
orf58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58-1 . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
I I I I I I M I I I I I I I M M I I I I I I I I I I I I II I I I I I I M : I I I II I I I I I I I I I I I I 
orf58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58-1 . peo QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I II I I M I I I : II I I I M I I I I I I I I I 
orf 58ng-l QGQSVSDGTAVRDARRRVSVNLKE PNKATVSAEARISRLI PESRTWGKRDVEMPSETEN 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 58-1 . peo VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWVVEPPEVPKVPMTAIDIQPPPPVSEIY 
I I I I I I I I I M I I I I I I I : I I M I I I I I I I I I I I I I I I I I I : I : I II I I I 1 I 1 1 1 I 
orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 

370 380 390 400 410 420 



430 440 450 460 470 480 

orf 58-1 .peo NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
II I I I I I : I Ml : I I! I I I I I I I I I I 11 I I I I I i I I I I I I I I M I M I I I I M I M I I I 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGOYLSET 

430 440 450 4 60 470 480 

490 500 510 520 530 540 

orf 58-1 . pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
II I I M M I II II I : I I I II M I II I II I M I II I II I II I II I I II I I II I I II II II 
orf58ng-l EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1 .pep EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
I I 11 I I II i I II M II 1 I II M M I M M II I II I I I II I M I II M I I II I I I I I I I I I 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58-1 .pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I M I II II I I I I II I I I II M I M M I I II M.I II I I I I M I II I I II M I I I 1 I I I 
orf58ng-l LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf58-l .pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I M I I M II I M t M I II I II I M I I II i I I II I I I II I I I M II I I I ! I I II I I II I I 
orf 58ng-l TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
. 670 680 . 690 . 700 710 720 
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730 740 750 760 770 780 

or f 58-1 . pep EGIPHLIJVPVVTDMKIJU^ALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
III I I M I I M I I I I I I II I I II I I I I I M I M I I M I I I i I I M I I I I \ I I I I I I I M 
orf58ng-l EGITHLIAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 7 60 770 780 

790 800 810 820 830 840 

orf 58-1 .pep GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I | I 11 I I I I I ! I I I I I I M I I I I II I I I I I I I I M I I I M I I I II I I I I I I I I I I I M I 
orf 58ng-l GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orf 58-1 .pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

M M I 1 I I I M M I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I t I I I I I I I I 
orf 58ng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

o-f 58-1 .pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I M I I I I II I I II I I I I I I I I I I I I II 
orf 58ng-I VHGAFAS DEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

or f 5 8 - 1 . peD VLKTRKAS I SGVQRALRIGYNRAARLI DQMEAEGI VS APEHNGNRT I LVPLDNAX 
I I I I I I I I M I I II I I I I I I I II I I I I II I I I I I I II M I I I I I I I I I I I I I I I I 
orf5 8ng-l VLKTRKAS I SGVQRALRIGYNRAARLI DQMEAEG I VSAPEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 

30 Furthermore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

sp I P4 6889 I FTSK_ECOLI CELL DIVISION PROTEIN FTSK >gi 11651412 1 gnl i PI D|dl 015290 (Dl 
division protein FtsK [Escherichia coli} >gi I 1651418 I gnl [ PID I dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi I 1787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
35 Score - 576 bits (1469), Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%), Gaps - 5/459 (1%) 
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55 
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Query : 
Sbjct : 
Query : 
Sbjct : 
Query: 
Sbjct : 
Query: 
Sbjct : 
Query : 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbj ct : 



556 IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
8 68 VEARLADFTIIKADVA^YSPGPVITRFELNIAPGVKAARISNLSRDLARSLSTVAVRVVEV 927 

616 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 675 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
92 6 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

67 6 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 7 35 

LVAGTTGSGKSVGVNAMI LSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 104 7 

7 36 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 7 93 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

104 8 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

794 --LEKLPFIWWDE FADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

912 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

• H W+ K G P YVD IS SE G G G E DP+ + D+AV V + RKASISG 
1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

972 VQRALRIGYNRAARLI DQMEAEG I VSAPEHNGNRT I LVP 1010 

VQR RIGYNRAAR+ 1 +QMEA+GIVS HNGNR +L P 
1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 59 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 497>: 

5 1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC..GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

10 901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT G C AC AT TAT C ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

1 5 This corresponds to the amino acid sequence <SEQ ED 498; ORF101>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 
51 ALVGFWV 

// 

301 . ..IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 
20 351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 499>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC J 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

25 151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG AT AC CGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

30 4 01 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

4 51 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC AT CAT G AAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

35 651 CGGACGCGCC G ACT AC AAT C AGGTTTCCTT CCAAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

7 51 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

40 901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC ' TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

45 This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAI DAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

50 201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHI IMF AVAL I LL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORF 101a) from strain A of N. meningitidis: 



10 



20 



30 



40 



50 



10 



15 



20 



orf 101 .pep 



orf 101a 



orf 101 .pep 



orf 101a 



orf 101 . pep 
orf 101a 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 
I I I I I I I I I I I M I I I I I I t I I I I I I I I I I t I I I II II! I I I I I I I I I I I II I 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I i I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 

120 130 140 150 

LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 
I I I M I I I I : I : : I I I I I I I I I ! I I I I I I I I I I I I I I M I 
LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 



The complete length ORF1 01 a nucleotide sequence <SEQ ID 501> is: 
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30 



35 



40 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGATTTATC 
CATTTTCGTC 
TGCTCGGCCN 
TTGGTCGGCT 
CGCATTTATC 
AAATGTCGGT 
CCGGTGATGC 
GCTTTGGGTG 
TCCTGAAGCA 
AGTTTGGGCA 
CGAATCCGGC 
GCGGCGACAA 
AACAAACGCA 
CGGACGCGCC 
TCAGCACCAC 
CCNACNGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGANTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGCATGCCCA 
GAAAGGCGGA 



AAAGAAACCT 
GTCCTCTTGG 
TGCCGCCGAC 
TCTGGGTCNN 
AGTACGTTGA 
CTGGNTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 
CGCTCGAATT 
GACTACAATC 
GCCCAAACTC 
AACTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



CATCAAAGAA 
CGGTATTGGT 
NGGCGTNTCG 
NNGNATGACG 
CCGTGTTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTACG 
TTGTCTTTGG 
CAGGGTTTAT 
ACCTGTTCCT 
NCCAAAGAAA 
GCGCCACGGC 
AGGTTTCCTT 
ATCGACCCCG 
CAGCAGCAAC 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



CTCTCTTTTA 
CTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CATTGAAACA 
GTTTTGGTTG 
CAGCCGCGAA 
TGGAGGCAGG 
TTTGTCGAAA 
GCGCGAACAG 
GTAACTTCTC 
TACCGTTACA 
CCNAAAACTC 
TTTCCCACCG 
CCGCAACATC 
CCTCCTACTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



CCGCCGTCGG 
GCAATCAACC 
CGTGTTGGCA 
TNGTGTTGAC 
CGNGACAGCG 
ATGGATACGC 
CCGTCATGCA 
TACGCTGAAA 
CGGGTTCAAC 
CCTTCGATAC 
GACAAAAACG 
GCTGAACGAC 
GCGGCACGCC 
AACCTGATTA 
CCGTACNATN 
ANGCGGAATT 
TGCCTGCTTG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This encodes a protein having amino acid sequence <SEQ ID 502>: 



i 

51 
101 
151 
201 
251 
301 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 



LVGFWVXXMT PLLL VLTAFI 
PVMQ FAVPFA VLVAVMQLWV 
SLGKRNGRVY FVETFDTESG 
NKRTLELRHG YRYSGTPGRA 
PTAQLIGSSN PQHXAELMWR 
LXAIGLFLIY QNGLTLLFEA 



STLTVLTRYW 
_IPWA£LRSRE 
IMKNLFLREQ 
DYNQVSFXKL 
ISLTVSVLLL 



AINLLGXAAD 
RDSEMSVWXS 
YAEILKQKQE 
DKNGGDNIIF 
NLIISTTPKL 
CLLAVPLSYF 



351 SMPSQPFWQA VGKSLTLKGG 



VEDGKIHFWL 



GLLPMHIIMF 



XRXAI DAVLA 
CGLALKQWIR 
LSLVEAGGFN 
XKESNFSLND 
IDPVSHRRTX 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 
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60 



orf 101a . pep 
orfl01-l 
orf 101a . pep 
orf 101-1 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 
I I I i I M I I I I I I II I I II I II I I I I I I II I I t I I I III I M I I I I M I II I I II 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 



60 



60 



120 



PLLLVLTAFI STLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 
II I M I I I I M I II I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I i I M II I 1 I I I I I I 
PLLLVLTAFI STLTVLTRYWRDSEMSVWLS CGLALKQWIR PVMQFAVPFA VLVAVMQLWV 12 0 
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10 



15 



20 



orflOla.pep 
orf 101-1 
orf 101a . pep 
orfl01-l 
orf 101a . pep 
orfl01-l 
orf 101a . pep 
orf 101-1 
orf 101a . pep 
orfl0i-l 



IPWAELRSREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

I j I M I I I I 1 I I S I I M 1 I I I I I II 11 I I I II I I I I I M II M I 1 I I I II I I I M I I I I 
IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 24 0 

1 I 1 I I I I I I I I I : I I I I I I I I I M I I I I I II I I I I I I I I I I II M I I I M 11 I M I I I 
DKNGGDN 1 1 FAKEGN FS LNDNKRT LE LRHG YRY SGT PGRAD YNQVS FQKLNLI I STTPKL 2 4 0 

IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

I I t I I I I II I I I II I I I ! II II I I I I I I I I II I I I I I I II I I I I I II I II I I I I I M I 

I DPVSHRRTI PTAQLIGS SNPQHQAELMWRI SLTVSVLLLCLLAVPLS YFNPRSGHTYNI 300 

LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 

I I I I I I I I I I I I I I II I I I II I II I I I I I I I I I I I I I M : : I : : I I I I II I M I I I I I I 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 3 60 



VGKS LT LKGGK 
I I I i I I I I II ! 
VGKS LTLKGGK 



371 



371 



25 



30 



35 



Homoloev with a predicted ORF from N. gonorrhoeae 

ORF101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
gonorrhoeae: 



orf 101 . pep 
orf lOlng 

orf 101 . pep 
orf lOinq 
orf 101 . pep 
orf lOlng 



M I YQRN LI KELS FTAVG I FWLLAVLVSTQAINLLGRAADGXVI AI DAVLALVG FWV 57 
I I I I M I I I I I I I II M I M I I I I 1 M I I I I I I I I I II I I I I I II I I I I I I M I I 
MI YQRN LI KELS FTAVG I FWLLAVLVSTQAINLLGRAADGRV-AI DAVLALVG FWVIGM 59 

// 

IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 
I II M I I I I I I II M II I I I I I I I I I I I II 
SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

LLPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 37 3 
I I M II M I I : I : : I I I I I I I I I I I I I II I I 

LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 362 



40 



45 



The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 504>: 



i 

51 
101 
151 
201 
251 
301 
351 



MI YQRNLIKE LSFTAVGIFV V LLAVLVSTO 
LVGFWVIGMT PLLLVLTAFI STLTVLTRYW 



PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE 

NLGKRNGRVY FVETFDTESG IMKNLFLREQ 

NKRTLELRHG YRYSGTPGRA DYNQVSFQKL 

STAQLIGSSN PQHQAELMWR ISLTVSVLLL 

LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHI IMF 

SMPSQPFWQA VG. . . 



AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNI IF 
NLIISTTPKL 
CLLAVPLSYF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



Further work revealed the complete nucleotide sequence <SEQ ID 505>: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGATTTATC 
CATTTTCGTC 
TGCTTGGCCG 
TTAGTCGGCT 
CGCATTCATC 
AAATGTCGGT 
CCCGTCATGC 
GCTTTGGGTG 
TTTTGAAGCA 
AACTTGGGCA 
CGaatccgGC 
gcggcgacaA 



AAAGAAACCT 
GTCCTCTTGG 
CGCAGCTGAC 
TCTGGGTCAT 
AGCACGCTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
AT CAT G AAAA 
CATCATCTTC 



CAT C AAA G AA 
CGGTGTTGGT 
GGGCGTGTCG 
CGGTATGACC 
CCGTATTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTGCG 
TTGTCTTTGG 
CAgggtttaT 
ACCTGTtcct 
GCcaaaGAag 



CTCTCTTTTA 
GTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CGTTGAAACA 
ATCCTGATTG 
CAGCCGCGAA 
TGGAAGCCGG 
TtcgtcgaaA 
GcGCGAACAG 
gtaactTctc 



CCGCCGTCGG 
GCGATCAACC 
CGTGTTGGCC 
TGGTGTTGAC 
CGCGACAGCG 
GTGGATACGC 
CCGTCATGCA 
TATGCCGAAA 
CGAGTTCAAT 
CCTTTGACAC 
GACAAAAACG 
gctgaaggaC 



BNSDOCID: <WO _9924578A2J_> 



WO 99/24578 



-298- 



PCT/IB98/01665 



10 



601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtcctt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ED 506; ORF101ng-l>: 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 



MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ 



LVGFWVIGMT PLLL VLTAFI 

PVMQ FAVPFA ILIAVMQLWV 

NLGKRNGRVY FVETFDTESG 

NKRTLELRHG YRYSGTPGRA 

STAQLIGSSN PQHQAELMWR 

LIAIGLFLIY QNGLTL LFEA 

SMPSQPFWQA VGKSLTLKGG 



STLTVLTRYW 
_IPWA£LRSRE 
IMKNLFLREQ 
DYNQVSFQKL 
ISLTVSVLLL 



_AINLLGRAAD 
RDSEMSVWLS 
YAEILKQKQE 
DKNGGDNIIF 
NLI ISTTPKL 
CLLAVPLSYF 



VEDGKIHFWL 
K* 



GLLPMHIIMF 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



ORFlOlng-1 and ORF101-1 show 97.6% identity in 371 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



orf 101-1 .pep 
orf 101ng-l 



orf 101-1 .pep 
orf 101ng-l 



orf 101-1 .pep 
orf 101ng-l 



orf 101-1 .pep 
orf 101ng-l 



orf 101-1 . pep 
orf 101ng-l 



orf 101-1 .pep 
orf 101ng-l 

orf 101-1 .pep 
orf 101ng-l 



10 20 30 40 50 60 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 
I I I I I ! I I II I I I I I I I II I I I I I I I I I M M I I I II I I I I II I II I I I I I I I I I I I M I 
M I YQRNLIKELS FTAVG I FWLLAVLVSTQAINLLGRAADGRVAI DAVLALVG FWV I GMT 

10 20 30 40 50 60 

70 80 90 100 110 120 

PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 

I I II I I I I I 11 I I I I II I I I I M I I I I I I I I It I M II I I I I I I I I I 11 I : I : I I I I 1 I I 
PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAI LI AVMQLWV 
70 80 90 100 110 120 

130 140 150 160 170 180 

I PWAELRSREYAE I LKQKQELSLVE AGE FN SLGKRNGRVY FVETFDTESG IMKNLFLREQ 
I I 11 I I I I I I I M I I I 1 I I I I I I I I I I I I I : I II I It I II I I I II I I 1 I ! I I I I II I I M 
I PW AE LRS RE YAE I LKQKQELSLVE AGE FNNLGKRNGRVY FVETFDTESG IMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

DKNGGDNIIFAKEGN FSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLI ISTTPKL 
I I I I II I I I It I I M I II : I I M I I I I I M I I I I I I I I I II I I I I I I I I I I M I I I 1 I I I 
DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLI ISTTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
t I I I M I I I I I II I II I I I I I ! I I I I I I I I I I I I I M I I I I I M I I I I I II II I I I I II 
I DPVSHRRT I STAQLIGSSN PQHQAELMWR I SLTVSVLLLCLLAVPLSYFNPRSGHTYN I 

250 260 270 280 290 300 

310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 
I I I I I I I I II II 1 I I 1 t I I I I I I I I I I I 11 I M I 1 I I I 11 : : I : : I 1 I I II I I I I I I t I I 
LIAIGLFLI YQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKSLTLKGGKX 
I I I It 11 I I II I 
VGKSLTLKGGKX 
370 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N. meningitidis and AT. gonorrhoeae , and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 507>: 

5 1 GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GAT GC AC ATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

4 01 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ED 508; ORF1 13>: 

15 1 . .GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 

101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with with pspA putative secreted protein of N. meningitidis (accession AF03094H 
20 ORF and pspA show 44% aa identity in 179aa overlap: 

orf!13 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT + P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orf 113 PVWGQDVRVVAGQNDVAATGDAHS PI LXXXXXXXXXXXXXXGTHI PLFAI DTGKLGGMYA 120 

VWG + DV+VV-i-G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKVVSGKNKLDFDG S LAKT AS APS S S DS VT PT VAI DTAT LGGMYA 307 

orf 113 NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 17 9 
30 +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

orfll3 ■ GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

I I I I I I I I I I I I I :: I I I I 1 I I : I : I I ! I 
orf 113ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 22 A 

40 orf 113 QGNVVIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRVVAGQNDVAATG DAHSPILNNA 90 

I I I : I I I I I 1 I I I I I I I : I I I I 
or f 1 1 3ng QGNAVIAGHGLDARDTDFTRI LVCQQNHLDQYGRTSRHS 2 63 

or f 1 1 3 I DTGKLGGXVCQQNHLDQYGRASRHS 13 5 

45 | | | | || j | I | | I : ! I M 

orf 113ng D FS G FK I RQGNAV I AGHGLDARDTDFTR I LVCQQNHLDQYGRTSRHS 2 63 

The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 

50 1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 

51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

5 Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 51 1>: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

10 51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

15 301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

4 01 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

4 51 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

20 551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

25 801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

651 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF1 15>: 

30 1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

35 251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.meninzitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 

40 Or f 11 5: 1 STGHSEQNYTLPRE ITRN I SLGSFAYESHRKALSHHAPSQGTELPQSNGI SLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 831 

PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 
45 LP+SSL+-I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 

-LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNKIHKRLGDGYYEQK 890 



55 



Orf 115: 


1 


pspA: 


776 


Orfll5: 


61 


pspA: 


832 


OrfllS: 


121 


pspA: • 


891 


OrfllS: 


181 


pspA: 


951 



LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 
L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
50 pspA:- 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVT^RLTSDIV 950 



WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 23 9 
WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G I AG 
WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAG 1009 
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OrfllS- 240 RNALIINTDTLDNIGGRIHAQKSAVTATQDINNIGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

r ALI+N +N+G++ ADING+AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGKNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 

Oifll5: 300 XXXXXXXXXYLDRMAGI YITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRNIGRVAGIYLTGRQNG 1093 

Homology with a predicted ORF from N. gonorrhoeae 

ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORF115ng) from 



15 



20 



25 



30 



35 



N. gonorrhoeae: 

orf 115 . pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115. pep 
orf 115ng 
orf 115 . pep 



orf 1 15ng 



orf 115. pep 
orf 115ng 



STGHSEQNYTLPREITRNISLGSFAYESHRK 
Mi I II I I I I : I I I I : II I I I I I I I 1 I I 
NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 



31 



71 



81 



ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYI INPVNKGYLVET 

I | | ; M I I I I I I I I M I i I I I I I I I I I I 1 M : I 1 I I I I I I : I I I I I M I 

ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYT PNSFTPLPGSSLYIINPANKGYLVET 131 



DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

I I I I II I I I I II I I II I I I I I I I I I I II I I I I M I I I I I M M II I I I I I I I II I I M I 
DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 



141 



191 



201 



EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 
| | | | I i I I I i I I I I M I I I I I I I I I I i I II : II ! I I I I I I I I I I M I I I I I i I I I I I : I I 
EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 



VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

I I I I I I I I I M I M I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 



261 



311 



SAVTATQDINNTGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 

| M 1 | | | | II I I I 1 : I I I I I I I I I I 1 I I I I I : I I I : I I I I : I I I I I M I I I I M I I I I I 
SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 



orf 115 . pep 
orf 115ng 
orf 115 . pep 
orf 115ng 

An ORF1 15ng nucleotide sequence <SEQ ID 513> was predicted to encode a protein having amino 



EKGV 
I I I I 

EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 



321 
371 
325 
431 



40 acid sequence <SEQ ID 514>: 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MLVQTEKDGL 
LPEEITRDIS 
SLPYT PNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
OAGNHVRIGT 
KEKTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGI YITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
K* 



YWRARRKGHD 
GTELPQSNRD 
.TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGb3NLIS 
A I A V AHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 



60 



i 

51 
101 
151 
201 
251 



TTGCTTGTGC 
CGAGAAGAAA 
CGCGTCGTAA 
TTGCCGGAGG 
ATCGCATAGC 
TGCCACAAAG 



AAACAGAAAA 
GTCTTCAGCG 
AGGACATGAT 
AAATCACACG 
AAAGCATTAA 
TAACCGGGAT 



AGACGGTTTG 
AAAATGGTAA 
GAAACAGGGC 
CGACATTTCA 
GCCGTCATGC 
AATATCCGTA 



CATAACGAGC 
GTTGCACAAC 
ATCGTGAACA 
CTGGGTTCAT 
GCCCAGCCAA 
CTGCGAAAAG 



AAACCTTTGG 
TACTGGCGTG 
AAATTATACT 
TTGCCTATGA 
GGCACTGAGT 
CAACGGTATT 



BNSDOCID: <WO 9924578A2_L> 



WO 99/24578 



-302- 



PCT/IB98/01665 



301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA G T G AC TAT AT GCTGGGCAGC 

4 51 CTCAAACTAG ACCCAAACAA TT T AC AT AAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

7 01 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

7 51 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAAT CAGCGG TTACGGCCAC 

951 AC AAG AC AT C AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AAT C AG AT C A AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

12 51 AT AT CAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

14 01 CACACTTGCC GTGTATGCTA AAAAT G AC AT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC AT AC AGG CAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 C AAG C AGG C A ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

17 01 CGAAACCTAT CATCAAACCC AAAAAT C AGG AT T GAT GAG T GCAGGTATCG 

17 51 G C T T C ACT AT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 
1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

18 51 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 
1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 
1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 
2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 
2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 
2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 
2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



LLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RPIKQAKAHK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
T* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAN 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
KSDKAKTTAL 



This gonococcal protein (ORF1 15ng-l) shows 91.9% identity with ORF115 over 334aa: 



20 30 40 50 60 70 

orfl!5ng-l .p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

I II I I I I I I I : I I I I : I M I I II M I I I 
or f 115 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 

80 90 100 110 120 130 

orfll5ng-l .p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
I i I : I I I I I I I I I I I I I . I I I I I I I . I I M I I I : I I I II I I I : I I I I I I I I 
orf!15 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 



WO 99/24578 



-303- 



PCT/IB98/0166S 



5 

10 
15 

20 
25 

30 
35 
40 
45 
50 
55 
60 
65 
70 

BNSDOC1D: <WO. 



140 150 160 170 180 190 

orf 115ng-l p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
|| I I I I I I I 1 I I I I I ! I I I I I I I I I I i I I I I I II I M M I I I I ! I I I I M I I I I I I I I I 
orf 115 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

90 100 110 120 130 140 

200 210 220 230 240 250 

orf 115ng-l . p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
I | M I M I I I II I I I I I II I II I I M I I I I : M I I I I I I I I I I II II I I 1 I I I I I I I : I I 
orf 115 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDTVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 

260 270 280 290 300 310 

orf 115ng-l . p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
I I I I I I I I I I I I II I I ! I I I I I I ! I M I I I I I II I I I I I I I I I I M I I I I I I II I II I I 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 

320 330 340 350 360 370 

orf 115ng-l .p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 
I | II I I I I I I I I I I : I I I I I I I I I I I I I I II : I I I : I I I I : I I I I ! I I I I M II I I I I I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 

270 280 290 300 310 320 

380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDIN 1 1 AGQI SNQS DQGQTRLQAGRDINLDTVQTGKYQE I HFDADNHTIR 

MM 

orfll5 EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

gi I 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 
= 2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 
L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 

LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 7 96 

LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 12 0 
+G AY+ + AP Q +++P + •+ NGI +T LP SSL+ I 

MGISAYKGY APQQAS DI PGTV VPWAENGIHPTFT LPNSSLFAI 8 40 

NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

GHRRLDGYQN DEEQFKALMDNGATAARSMNLS.VGIALSAEQAAQLTSDIVWLVQKEVKLP 24 0 
G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 2 99 
DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
DGTTQTVLKPKVYVRARPKDM14GQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 
+ N+G++ ADINGIAE LLL A NNI ++S + S+QN QGS 



+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT 



FD+DN+ IR NEVGS+I +T+G+++L + +AAEVGS +G L + A DI + 



+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 



Query : 


1 


Sbjct : 


739 


Qv ery : 


61 


Sbjct: 


797 


Query : 


121 


Sbjct : 


841 


Query: 


181 


Sbjct : 


901 


Query : 


241 


Sbjct : 


961 


Query: 


300 


Sbjct : 


1020 


Query: 


360 


Sbjct: 


1079 


Query: 


420 


Sbjct: 


1139 


Query: 


4 80 


Sbjct: 


1199 



.9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-304- 

Query: 54 0 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNI IADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

5 Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTES WGSLNGNTLI S AGKHYTQTGST I S S PQGDVGI S SGKI S I DAAQNRYSQESK 137 8 

Query: 659 QT YEQKGLTVAFS S PVT D 676 
10 Q YEQKG+TVA S PV + 

Sbjct: 137 9 QVYEQKGVTVAISVPWN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

15 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 517>: 

1 . . TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGG CATC A 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

20 201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAAT GG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

25 4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG . CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

30 This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>: 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQS IDIQAAH 

35 201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 
40 ++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 



45 



50 



Orfll7: 


4 


pspA: 


1173 


Orf 117 : 


64 


pspA: 


1233 


Orfll7: 


124 


pspA: 


1293 


Orf 117 : 


183 


pspA: 


1353 



HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 
+ AST +GK+++L +G D + GSN+I + DN T + A N + + + +T+S+S ++ 
NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 12 9; 

QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 
+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 135: 

PEGNNT I YAQS I DI QAAHNKLN SNTTQT YEQKXLT VAFS S PVT D 22 6 
P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 



BNSDOC1D: <WO i 



9924578A2_I_> 



WO 99/24578 



-305- 



PCT/IB98/01665 



Homology with a predicted ORF from N. gonorrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 



10 



15 



20 



orf 117 .pep 
orf 117ng 
orf 117 .pep 
orf 117ng 
orf 117 .pep 
orf 117ng 
orf 117 .pep 
orf 117ng 
orf 117 .pep 
orf 117ng 



SGNNLNAKAAEVSSANGTLAVSANNDINIS 

I ! I I I I I II I I I : I I : I I II I I : I I I : I I 
IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 



30 



480 



90 



AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 
: | I : : : | || I II I I i M I M I I I I I M I I I I I I I I II I II I I II I I I I I I M I I I I I I I 
SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 540 



NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 

I I I | I M I : I I 1 I I I I I II M I I I I M I I I I I I I I I I I I I M I I i I I II I i I M I I I I I I 
NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 

NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 

MINIMUM!! MINI II : I II I I I I I I MUM I I : I : I I I : I I II 

NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 

YEQKX LTVAFS S P VT DLAQQ 

II I i i i I I I I I II M I I I I 

YE QKG LTVAFS SPVTDLAQQAIAV AH KAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 



150 
600 
210 
660 
230 
720 



An ORF1 17ng nucleotide sequence <SEQ ID 519> was predicted to encode a protein having amino 



acid sequence <SEQ ID 520>: 



25 



30 



35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



. LLVQTEKDGL 
LPEEITRDIS 
SLPYT PNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQT FGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVT DLAQQ 
■K* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKKSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



40 Further work revealed the following gonococcal partial DNA sequence <SEQ ID 52 1>: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



TTGCTTGTGC 
CGAGAAGAAA 
CGCGTCGTAA 
TTGCCGGAGG 
ATCGCATAGC 
TGCCACAAAG 
TCGCTACCCT 
ATACATTATC 
GCTTTGCCAA 
CTCAAACTAG 
CGAGCAACGT 
GTTTAGACGG 
AATGGCGCGA 
AAGTGCCGAG 
AAAAAGAAGT 
CAGGTTTATG 
GTTGTCAGGC 
CAGGCACGAT 
GACAATATCG 
ACAAGACATC 
TGCTCAATGC 
CAAAATGCAC 



AAACAGAAAA 
GTCTTCAGCG 
AGGACATGAT 
AAATCACACG 
AAAG CAT T AA 
TAACCGGGAT 
ATACGCCCAA 
AATCCTGCCA 
CTACCGTCAA 
ACCCAAACAA 
TTAATCAATG 
TTATCAAAAC 
CTGCGGCACG 
CAAGCAGCGC 
TAAACTTCCT 
TACGCGTTAA 
AGCAATACAC 
TGCAGGGCGC 
GTGGGCGTAT 
AATAATATTG 
GGGTAACAAC 
AAGGTAGCAG 



AGACGGTTTG 
AAAATGGTAA 
GAAACAGGGC 
CGACATTTCA 
GCCGTCATGC 
AATATCCGTA 
TTCTTTTACC 
ATAAAGGCTA 
TGGTTGGGTA 
TTTACATAAA 
AACAAATCGC 
GACGAAGAAC 
TTCGATGAAT 
AACTGACCAG 
GATGGCGGCA 
AAATGGCGGC 
AAATCAATGT 
AATGCGCTTA 
TCATGCGCAA 
GCGGCATTCT 
ATCAACAACC 
CACCTACCTA 



CATAACGAGC 
GTTGCACAAC 
ATCGTGAACA 
CTGGGTTCAT 
GCCCAGCCAA 
CTGCGAAAAG 
CCATTACCCG 
TCTTGTTGAA 
GTGACTATAT 
CGTTTGGGTG 
AGAGCTGACA 
AATTTAAAGC 
CTCAGCGTTG 
CGATATTGTT 
CACAAACCGT 
ATAGACGGTA 
TTCAGGCAGC 
TTATCAATAC 
AAATCAGCGG 
TTCTGCCGAA 
AAAGCACGGC 
GACCGAATGG 



AAACCTTTGG 
TACTGGCGTG 
AAATTATACT 
TTGCCTATGA 
GGCACTGAGT 
CAACGGTATT 
GCAGCAGCTT 
ACCGATCCAC 
GCTGGGCAGC 
ATGGTTATTA 
GGGCATCGTC 
CTTAATGGAT 
GCATTGCATT 
TGGTTGGTAC 
ATTGATGCCA 
AAGGTGCATT 
CTGAAAAACT 
CGATACGCTA 
TTACGGCCAC 
CAGACATTAT 
CAAGAGCAGT 
CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

14 01 CACACTTGCC GTGTATGCTA AAAAT G AC AT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGT CAT C ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAAT CAGG ATTGATGAGT GCAGGTATCG 

17 51 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AA CAGG C AAA 

2151 GGCGC ACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

2 01 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALI INTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

4 51 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 

gi | 2623258 (AF030941) putative secreted protein (Neisseria meningitidis ] Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 

Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNG I SLPYTPNSFT PLPGSSLYII 120 

+ G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 7 97 ■ MGISAYKGY APQQAS D I PGTV VPWAENG IHPT FT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

p NKGYL+ETDP F +YR+WLGS YML + L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 24 0 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct : 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGG IDGKGALLSG SNTQINVSGS LKN-SGT I AGRNALIINTDT 2 99 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct : 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 



Query: 300 



LDN IGGRIHAQKSAVTATQDINN I GGI LSAE QTLLLNAGNN I NNQSTAKS SQNAQGS ST Y 359 
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+ N+G++ ADINGIAE LLL A NNI ++S +S+QN QGS 

sb j CT:: 102 0 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query : 3 60 LDRMAGI YITGKEKGVLAAQAGKDINI I AGQISNQS DQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 107 9 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 E I HFDADN HT I RGSTNE VGSS I QTKG DVT LLSGNNLNAKAAEVGSAKGT LA VYAKNDIT I 47 9 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 113 9 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 4 80 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 119 9 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 54 0 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 65 9 QTYEQKGLTVAFSSPVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 137 9 QVYEQKGVTVAISVPVVN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 523>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AAC AG G AT G C CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

4 01 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORP1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATK P ADAS AK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY... 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 



1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC .CGGGCATCCG 
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€51 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



CTATCAGGCA 
CCTCGCAGGA 
CAAAGCATGG 
AGTGGCTTCC 
CCATCCATTT 
GCCGTAACGG 
TACCGACACG 
AGCCGTTTAC 
ATGCTGCTCG 
TTTGTTTATG 
TGGTCAACGA 
CGCACTTATG 
ACCGGGCGGC 



TTTATCGTGG 
AGAACTCTCC 
GCGGTCAGAC 
GCACTGGACG 
GGTTTCCCCG 
GCGTGGGTTT 
TCGGGCTCGA 
CAACGCCCTT 
ACATCCCGCA 
GATTTGGCGG 
CAAAATGGAA 
TATTGGCGCG 
AAAACCGCAT 



GTATTCAGGC 
GCATTCAACC 
GCTGCACACC 
CATTCTGCGC 
ACCAGCATCA 
CGTTTTGGAA 
CCATGTTCTC 
TTGGACAACC 
CTCTCCGGCA 
TACGCCTGTC 
GAAGTTTCGA 
TCAGTCCGAG 
TGCGCCTGTT 



AGTCAGCCGC 
GCCAGGTGGA 
GACCTTGCCG 
GCGCGTCGAC 
GCGGCGTAGA 
GACGACGGCG 
CATCTGCTCG 
AGTCCTACAA 
GGCGAAAAAA 
CGGCCAGTTG 
CCCAATGGCT 
AT GCTCAAAG 
CTCCTAA 



AACGGACTTG 
CGCATTCGCA 
CCTTTATCGA 
CAGACCATCG 
ACTGCGTTCC 
CGTTCCACTA 
CTCAACAACG 
AGGCTTCAGT 
CCTTCGACGA 
AACCTGAATC 
CAAAGACGTG 
TCGGTATCGA 



This corresponds to the amino acid sequence <SEQ ID 526; ORF1 19-1>: 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 



MIYIVLFLAV VLAWAYNMY 



DGKPSGGSVM 
FKTEIETALE 
PLITLKELSK 
CTMDDHFQIA 
QSMGGQTLHT 
AVTGVGFVLE 
MLLDIPHSPA 
RTYVLARQSE 



MPKPQPAVKK 
ESGIIGNSAH 
VELPWFDVRF 
EPIPGIRYQA 
DLAAFIEVAS 
DDGAFHYTDT 
GEKT FDDLFM 
MLKVGIEPGG 



QENQYRKKVR 
TAKPQDPAMR 
TVSEPQTGHS 
DFISYIALTE 
FIVGIQAVSR 
ALDAFCARVD 
SGSTMFSICS 
DLAVRLSGQL 
KTALRLFS* 



DQFGHSDKDA 
NLQEQDAVYI 
APKPADAPAK 
AKELHALPRL 
NGLASQEELS 
QTIAIHLVSP 
LNNEPFTNAL 
NLNLVNDKME 



LLNSKTSHVR 
AKQKQAKASP 
PAPVPQTPAK 
SNRCRYQIVG 
AFNRQVDAFA 
TSISGVELRS 
LDNQSYKGFS 
EVSTQWLKDV 
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30 



35 



40 



45 



Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 11 9. pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I M I M I I I : I M I I I I I I I I I I I I I I I I I I I I I M I I I I M I I II I I t I I I I I I I II 
orf 119a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 11 9. pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I I I I 1 I I II I I I I III I I I I I M I I I I I I I I I I I I I I I M I I II I I I I II I I I I I I I I 
orf 119a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGI IGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 

orf 11 9. pep TVSEPQTGHS ATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

II I I I I I I I I I I I I I I I I : I II I I M I I II I I I I I I II I I 11111:11111 

orf 11 9a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 



The complete length ORF 1 19a nucleotide sequence <SEQ ID 527> is: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



AT GATT T AC A 
CAATATGTAT 
GGCACTCCGA 
GACGGCAAAC 
GGTCAAAAAA 
AGCAGGATGC 
TTCAAAACCG 
CTCCGCCCAC 
CTGCCGACGC 
CCGCTGATTA 
CGTGCGCTTC 
TGCACGCACT 
TGCACCATGG 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
ACGGCAAAAT 
CGTCTACATC 
AAATCGAAAC 
ACCGTTCCCG 
GCCGGCAAAA 
CGCTCAAAGA 
GACTTCATCT 
GCCGCGCCTT 
ACGACCATTT 



CCTCGCCGCC 
AATACCGCAA 
CTGCTCAACA 
GCCAGTCATG 
CCCAAGACCC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCTGTTCCCG 
GCTGTCGAAG 
CTTATATCGC 
TCCAACCGCT 
CCAGATTGCC 



GTCCTCGCCG 
AAAAGTGCGC 
GCAAAACCAG 
ATGCCGAAAC 
CGCCATGCGC 
AAC AGG C AAA 
GAAAGCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAGCTGC 
GCTGACCGAA 
GCCGCTACCA 
GAACCCATCC 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCACCAAAAC 
GCCGGCAAAA 
CCTGGTTTGA 
GCCAAAGAAC 
GATTGTCGGC 
CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

7 51 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG AC C AG CATC A GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ED 528>: 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPE PQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19a and ORP1 19-1 show 98.6% identity in 428 aa overlap: 

10 20 30 40 50 60 

orf l 1 9a pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVR DGKPSGGPVM . 

I I I I I I ! I I : I I I I I I I I I I I I I I I I M I I I I 1 I I I 1 I I I I M I I I I I I I I I I II II II 
orf 119-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM. 

10 20 30 40 50 60 

70 80 90 100 110 120 

or^li 9a. pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I M II I I I II I II I II I I I I I II M I I I I I II I I I I I I I I I I I I I I ! I I II I I M I I I I 
orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 119a . pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

II I M I I I M I M I I M I M : I I I I I I I I I I I i M I I I M I I I I I I I I M I I I I I I I I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQT PAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 119a. pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
i M I I I I I M I I I I I I I I I I I M I I I i M I I I I I I I I M I I I I I M II I I I I I I I I I I II 
or ill 9-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 119a . pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I I I I I I I I I I : I I I I I I I I I I I I I I I I II I M I 1 I I I I I I I I I I I I I I I' I I 1 I I II I I I I 
orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 119a . pep AVTGVGFVLE DDGAFHYTDTSGSTMFS I CSLNNEPFTNALLDNQSYKG FSMLLDI PHSPA 

I I I M I I I I I I II II I I I I I I II II I I l-l H I I I i I I M I I I 1 I II I ! M II I I I I I I I I 
orfl!9-l AVTGVGFVLEDDGAFHYTDTSGSTMFS I CSLNNEPFTNALLDNQSYKG FSMLLDI PHSPA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 11 9a . pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I I M I I M I I I I I I I I I I I I 1 I I I II I I 1 I I I I M M I I 1 I I I I I I I M I M I I I 1 I I I I 
orf 11 9-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 . 390 400 410 420 
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orfll9a.pep 
orfll9-l 



KTALRLFSX 
MINIMI 
KTALRLFSX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 



10 



15 



MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 

M I I M I M : M II II II M M M I M I M II I M II M M M I I II I I I II M II M 
MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 



N. gonorrhoeae: 
orf 119. pep 
orf 119ng 
orf ll 9. pep 
orf 119ng 
orfll9.pep 
orf 119ng 

The complete length ORF1 19ng nucleotide sequence <SEQ ID 529> is: 



60 



60 



MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

1 M I I M I I I Mill M I I I I I I II I II I M II M II I I M II I I I M II I II I II 
MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 



TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
M I I I I M II I I I I I I I II : II I M II M I I M I II I I I II M M I : I I II I 
TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 



175 



180 



20 



25 



30 



35 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
6C1 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



AT GAT TT AC A 
CAATATGTAT 
GACACTCCGA 
GACGGCAAAC 
GGTCAAAAAA 
AACAGGATGC 
TTCAAAACCG 
CTCCGCCCAC 
CTGCCGACGC 
CCGCTGATTA 
CGTGCGCTtc 
TGCACGCACT 
TGCACCATGG 
CTATCAGGCA 
CCTCGCAGGA 
CAAAG CAT GG 
AGTGGCTTCC 
CCATCCATTT 
GCCGTAACGG 
TACCGACACG 
AGCCGTTTAC 
ATGCTGCTCG 
TTTGTTTATG 
TGGTCAACGA 
CGCACTTATG 
ACCGGGCGGC 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
CCGGCCAAAC 
CGTCTACATC 
AAATCGAAAC 
ACCGTTTCCG 
GCCGGCAAAA 
CGCTCAAAGA 
gACTTCATCT 
GCCGCGCCTT 
ACGACCATTT 
TTTATCGTGG 
AGAACTCTCC 
GCGGTCAGAC 
GCACTGGACG 
GGTTTCGCCG 
GCGTGGGTTT 
TCGGGCTCGA 
CAATGCCCTT 
ACATCCCGCA 
GATTTGGCGG 
CAAAATGGAA 
TATTGGCGCG 
AAAACCGCCC 



CCTCGCCGCC 
AATACCGCAA 
CTGCTCAACA 
GCCAGTCATG 
CCCAAGACTC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCCGTTCCCG 
GCTGTCGAAG 
CCTATATCGC 
tccAACCGCT 
CCAGATTGCC 
GTATCCAGGC 
GC ATT CAACC 
GCTGCACACC 
CATTCTGCGC 
ACCAGCATCA 
CGTTTTGGAA 
CCATGTTCTC 
TTGGACAACC 
CTCTCCGGCA 
TACGCCTGTC 
GAAGTTTCGA 
TCAGTCCGAG 
TGCGCCTGTT 



GTCCTCGCCG 
AAAAGTGCGC 
GCAAAACCAG 
ATGCCGAAAC 
CGCCATGCGC 
AACAGGCAAA 
GAAATCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAGCTGC 
GCTGACCGAA 
GCCGCTACCA 
GAACCCATCC 
AGTCAGCCGC 
GCCAGGCGGA 
GACCTTGCCG 
GCGCGTCGAC 
GCGGCGTAGA 
GACGACGGCG 
CATCTGCTCG 
AGTCCTACAA 
GGCGAAAAAA 
CGGTCAGTTG 
CCCAATGGCT 
ATGCT CAAAG 
TTCATAA 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCACCGAAAC 
GCCGGCAAAA 
CCTGGTTTGA 
GCCAAAGAAC 
GATTGTCGGC 
CGGGCATCCG 
AACGGACTTG 
CGCATTCGCA 
CCTTTATCGA 
CAGACCATCG 
ACTGCGTTCC 
CGTTCCACTA 
CTCAACAACG 
AGGCTTCAGT 
CCTTCGACGA 
AACCTGAATC 
CAAAGACGTA 
TCGGTATCGA 



This encodes a protein having amino acid sequence <SEQ ID 530>: 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MIYIVLFLAA VLAWAYNMY 



DGKPSGGPVM 
FKTEIETALE 
PLITLKELSK 
CTMDDHFQIA 
QSMGGQTLHT 
AVTGVGFVLE 
MLLDIPHSPA 
RTYVLARQSE 



MPKPQPAVKK 
EIGIIGNSAH 
VELPWFDVRF 
EPIPGIRYQA 
DLAAFIEVAS 
DDGAFHYTDT 
GEKTFDDLFM 
MLKVGIEPGG 



QENQYRKKVR 
PAKPQDSAMR 
TVSEPQTGHS 
DFISYIALTE 
FIVGIQAVSR 
ALDAFCARVD 
SGSTMFSICS 
DLAVRLSGQL 
KTALRLFS* 



DQFGHSDKDA 
NLQEQDAVYI 
APKPADAPAK 
AKELHALPRL 
NGLASQEELS 
QTIAIHLVSP 
LNNEPFTNAL 
NLNLVNDKME 



LLNSKTSHVR 
AKQKQAKASP 
PVPVPQTPAK 
SNRCRYQIVG 
AFNRQADAFA 
TSISGVELRS 
LDNQSYKGFS 
EVSTQWLKDV 



ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 



60 



10 20 30 40 50 60 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I II II I I I : I I M II I It I II I I M M I II I I M I I I M I M M M M I II I II I I M 
orf 119-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-311- 



PCT/IB98/01665 



10 



15 



20 



25 



30 



35 



70 80 90 100 110 120 

orfll9na MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 
V I | | | | | | | | I | I I I I I I I I II M I I II M I II I I I I I M I I M M I I I I ! I M I I I I 

r>rf 119-1 mpkpqpavKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf!19na TVSEPQTGHSAPKPADAPAKPVPVPQT PAKPLITLKELSKVELPWFDVRFDFISYIALTE 

I || | | | | M | | | | 1 I M II I I : M I I I M I II I I I II II I I I I II I I I I I II I I I 

orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfll9nq AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
| | | | | | I | | I I I | I I I II I I I II I I I I I 1 II I I I I I I I I I M I III I I II I I I I I I I I I I 
orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
WJi "" ~ 190 200 210 220 230 240 

250 260 270 280 290 300 

orfll9nq AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
| | | | 1 : I I I I I I 1 M t I I I I M I I I I I I 11 I I I I I I I I I M I I M I I I I II I I I II I I I I 
orf 11 9-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 119ng AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
I I I M II I M M I II I I M I I I I M I II 1 I M I I I I I I I I I I I I I I I I I I I I I I I M I I I 
orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 119ng GEKTFDDLFMDIAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I I | | | M M II I II ! I M I I I I I II I I I I I I I I II I M M II I I I I I I I I I I I I I I I l:l I 
orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEP.GG 
" ~" " 370 380 390 400 410 420 



40 



429 

orfll9ng KTALRLFSX 
I II I I I I I I 
orfll9-l KTALRLFSX 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 Example 64 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 531 > 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



. GCGCGGCACG 
GCAGATAGTC 
TCGCCCTGAT 
CTGGTGTCCG 
CGGCGCGCGG 
TAATCTGCGT 
AGCCTCGTGT 
CATGTCCGTC 
TCGGCTTTAT 
TTGGCACAGG 



GCACGGAAGA 
GAAAGCACCA 
TTCATTGGTA 
TTACCGAGCG 
CGCGGCAATA 
CATCGGCGGT 
TCAATCATTT 
ATCGGCGCGG 
GCCTGCCAAT 
ATTGA 



TTTCTTCATG 
CCGGTACGAT 
GTCGGCGGCA 
CACCAAAGAA 
TTTyGCAGCA 
TTGGTCGGCG 
TGTAACCGAC 
TCGCCTGTTC 
AAAGCAGCCA 



AACAACAGCG 
GAAGCTGCTG 
TCGGCGTGAT 
ATCGGCATAC 
GTTTTTGATT 
TGGGTTTGTC 
TTCCCGATGG 
GACCGGAATC 
AACTCAATCC 



ACAC . ATCAG 
ATTTCCTCCA 
GAACATCATG 
GGATGGCAAT 
GAGGCGGTGT 
CGCCGCCGTC 
ACATTTCCGC 
GGCATCGCGT 
GATAGACGCA 



60 



This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

1 . . ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

' 101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI . GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 



-312- 



PGT/IB98/01665 



Further work revealed the complete nucleotide sequence <SEQ ED 533>: 



10 



15 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAATGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAACA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
C ACCAAAT C A 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCACA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGACTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GG C G CAC AAA 
TCGCGTCGGT 
ATCCTTGAAG 
GGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCCTCGCTT 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
GGTTTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAAT CAT C 
CGAGCGGCGG 
TACGGCGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
G CG AACAAT A 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
AT G AAC AT C A 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGACG 



This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 



KSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGNGSQKK 



30 



101 
151 
201 
251 
301 
351 



NTISIFPGRG 
YRKTDLTASL 
DKLFADSDPL 
HQITGESHTN 
RQIVESTTGT 
IGARRGNILQ 
AMSVIGAVAC 



FGDRRSGRIK TLTIDDAKII 
YGVGEQYFDV RGLKLETGRL 
GKTILFRKRP LTVIGVMKKD 
SITVKIKDNA NTQVAEKGLT 
MKL LISSIAL ISLWGGIGV 
Q FLIEAVLIC V1GGLVGV GL 
STGIGIAFGF M PAN KAAKLN 



AKQSYVASAT 
FDENDVKEDA 
ENAFGNSDVL 
DLLKARHGTE 
MNIMLVSVTE 
SAAVSLVFNH 
PIDALAQD* 



ILEDISSIGT 
PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



35 Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 of E.coli (accession number AE000189) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 



40 



45 



50 



55 



Orf 134 : 2 RH G TE D F FMNN SDXIRQIVESTT GTMKXXXXXXXXXXXWGG I G VMN IMLV S VT ERTKE I 61 

RHG + DFF N D + + VE TT T++. WGGIGVMNIMLVSVTERT+EI 
o648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orf 134: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

o64 8: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 

Orf 134: 122 GAVAC STG I G I AFG FM PAN KAAKLN P I DALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 1 54aa overlap with an ORF (ORF134a) from strain A of K 
meningitidis: 

10 20 30 

orf 134 . pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I t M M I I I I I It I I I I I II M I i I I I 1 I 
orf 134a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
-210 220 230 240 250 260 
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70 



80 



90 



BNSDOCID: <WO 9924576A2J_> 



WO 99/24578 



PCT/IB98/01665 



-313- 



10 



orfl34 peo ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 

I I ! M I I I I I I I I I II M M I I I I I I I I I I I I I I I I I I M I M I M I M I I II 

orfl34a ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 

100 110 120 130 140 150 

or f 134 pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I I I I I I I I I I I I I I II ! I I I I I I I I I I I M I II I I I I I I I I I I M I M II I II I I I I I I I 
o^-f 134 a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



15 



or f 134. pep LAQDX 
Mill 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 
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101 
151 
201 
251 
301 
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401 
451 
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551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AAC AC CAT C A 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AG GC AG AT AG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTTGAAG 
AGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
AT AAAG C AG C 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
T G AA C AAC AG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
AT G AAC AT CA 
ACGGATGGCA 
TTGAGGCGGT 
1CCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 536>: 
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i 

51 
101 



MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGNGSQKK 



NTISIFPGRG 
YRNTDLTASL 
151 DKLFADSDPL 
201 HQITGESHTN 
251 RQIVESTTGT 
301 IGARRGNILQ 
3 51 AMSVIGAVAC 



FGDRRSGRIK TLTIDDAKII 

YGVGEQYFDV RGLKLETGRL 

GKTILFRKRP LTVIGVMKKD 

SITVKIKDNA NTQVAEKGLT 

MKL LISSIAL ISLWGGIGV 

Q FLIEAVLIC VIGGLVGV GL 

STGIGIAFGF MPANKAAKLN 



AKQSYVASAT 
FDENDVKEDA 
ENAFGNSDVL 
DLLKARHGTE 
MNIMLVSVTE 
SAAVSLVFNH 
PIDALAQD+ 



ILEDISSIGT 
PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



50 ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 
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60 



65 



orf 134a. pep MSVQAVLAHKMRSLLTMLGIIIGIASVVSVVALGNGSQKKILEDISSIGTNTISIFPGRG 
I | I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I II I I I I 
orf 134-1 MSVQAVLAHKMRSLLTMLGI I IGI AS WSWALGNGSQKKI LEDI SS IGTNTI S IFPGRG 

orf 134a . Dep FGDRRSGRIKTLTIDDAKIIAKQSYVASAT PMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
I I M II I I I I I M I 1 I i I I I I I I I II 1 I I I I I I I I I I I I I I i I I II I M I I I I I II I I I I 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASAT PMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a . pep RGLKLETGRL FDENDVKEDAQVWIDQNVKDKLFADSDPLGKT I LFRKRPLTVIGVMKKD 
Ml I I I M I I I I I I I I M I I I I II I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orf 134-1 RGLKLETGRL FDENDVKEDAQVWI DQNVKDKLFADS DPLGKT I LFRKRPLTVIGVMKKD 

orf 134a . pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
I I I I I I I I M I I i I I I M I I I I II I I I I I I I I I I I I I I I I I I I I i I I I I 1 I I I I I I I I I I 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNS I TVKIKDNANTQVAEKGLT DLLKARHGTE 



BNSDOCID: <WO 992457BA2_I_> 



WO 99/24578 



-314- 



PCT/IB98/01665 



10 



15 



orf 134a. pep 
orf 134-1 
orf 134a . pep 
orfl34-l 
orf 134a. pep 
orf!34-l 



DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

I | | | 1 | I | I I I I I I II I M I I I I M I I I I I I I I I M I II I I M II II I I MINI 

DFFMNNSDSIRQTVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

I M | | | | | | | | I I I I M I I I I I I I I I I I I II I I I I I M M I I I I I I I I I I I I I II I I I I I 
IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STGIGIAFGFMPANKAAKLNPIDALAQDX 

I I I I I I I I I I I I I M I I I I I I II II I I I I 
STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 



20 



25 



30 



orf 134 . pep 
orf 134ng 
orf 134 .pep 
orf 134ng 
orf 134 .pep 
orf 134ng 
orf 134 .pep 
orf 134ng 



ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 
I I I II I I I I I I I I I I I I : I II I I I I I I I I 

GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 2 64 

I S S I AL I S LWGG I GVMN I MLVS VT ERTKE I G I RMAI G ARRGN I X QQ FL I E A VL I C VI GG 90 
I 1 I I M I I I I I I I I II I I I I I I I I I I M II I I I I M I I I I I II I I I I I I II I I I I : I I I 

ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 
I I I I I I I I M I I I I I I I II I I I I I I I I I I II I II I I I I M I I I I I I I I 1 I I I I I I I I I I 

LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 



LAQD 
I I I I 
LAQD 



154 
388 



The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 



35 



40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAAAATCAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCACGGCA 
AGGCAGATGG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCGGCATCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
CTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCGGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
ATCATCGGAG 
GTTCAATCAT 
TTATCGGGGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTCGAAG 
CGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAAGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GACTTCTTTA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GCTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAGGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
AC AT C AG T T C 
TTCGGCGACA 
AAAAAT CAT C 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGGCC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGTGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTAGGTTTG 
ATTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACCAT 
GTCGCGCTGG 
GATGGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACC 
GCGAACAATA 
TTTGATGAGA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GAGCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATTA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCG 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 538>: 



60 



1 MSVQAVLAHK MRSLLTMLGI IIGIASW5V VALG NGSQKK 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE 

301 IGARRGNILQ QFLIEAVLIC IIGGLVGVGL SAAVSLVFNH 



ILEDISSMGT 
PMTSSGGTLT 
QWVIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



BNSDOCID: <WO 9924578A2J_> 



10 



WO 99/24578 PCT/IB98/01665 

-315- 

351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

0-fl34nq MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSMGTNTISIFPGRG 
| t I I M i I! I I I I I I I I I I I M II I M I I I M I I I II I I I I I I I i I I : M I M I I I I I M 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orfl34nq FGDRRSGKIKTLTI DDAKI IAKQSYVASATPMTS SGGTLTYRNTDLTASLYGVGEQYFDV 

| | M I I I : I I I I I I M I I 1 I I I I I II M I I I I i II I I I I I M i I I I I I I I I I M I I I I II 
orf 134-1 FGDRRSGRIKTLTI DDAKI IAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

o r f 1 3 4 nq RGLKLETGRLFDENDVKEDAQVWI DQNVKDKLFADS DPLGKT I LFRKRPLT VIGVMKKD 

| | | | | M I I I I I I I I I | I I I I I I II I I I II I I I I I I I I II I I I I I I II I I I I I I ! I I M I 
or f 1 3 4 - 1 RGLKLETGRLFDENDVKEDAQVWI DQNVKDKLFADS DPLGKT I LFRKRPLT VIGVMKKD 



15 orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

| | | I I I II I I I I I I I I I I I I II I I II I I II M I M I M I I I I : 1 I I I II : : I I II I I I I I 
orfl34-l ENAFGNSDVLMLWS P YTT VMHQITGESHTNS ITVKIKDNANTQVAEKGLTDLLKARHGTE 

orf 34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
20 " M I I I II M I I I : I I M I I I I I I I I I I I M M I M I M M M I I I I I I I I I I II II I I I I 

orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orf I34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 
I 1 I I I I I i I I I I II I M I I I :! I I I I M I I I I I I II I I I I I I I I I I I I I I I I II I I I II 
25 orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orfl34ng STGIG I AFGFMPANKAAKLN P I DALAQDX 

I I I I I I it I M i I I I I I I 1 M I I II M I I 
orf 134-1 STGIG I AFGFMPANKAAKLNPI DALAQDX 

30 ORF134ng also shows homology to an E.coli ABC transporter: 

sp|P75831 !YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >gi5 
(AE000189) o648; similar to YBBA_HAEIN SW : P45247 [Escherichia colij Length = 
648 

Score = 297 bits (753), Expect = 6e-80 
35 Identities = 162/389 (41%), Positives ^ 230/389 (58%), Gaps = 1/389 (0%) 

MSVQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDISSMGTNTISIFPGRG 60 
M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 

.qhirf; 2 60 

40 

nno-rvr* 61 ^ - - - - - - 

FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 

FGDDDPQYQQALKYDDLIAIQKQPWVASATPAVSQNLRLRYNNVDVAASANGVSGDYFNV 

RGLKLETGRLFDENDVKEDAQVW I DQNVKDKLFAD-S DPLGKT I LFRKRPLT VI GVMKK 
G+ G F++ + AQVW+D N + +LF +D +G+ IL P VIGV ++ 

YGKTFSEGNTFNQEQLNGRAQVWLDSNTRRQLFPHKADWGEVILVGNMPARVIGVAEE 

DENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGT 
50 ~ ++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 

KQSMFGSSKVLRVWLPYSTMSGRVMGQSWLNSITVRVKEGFDSAEAEQQLTRLLSLRHGK 

EDFFMNNSDSIRQMVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEIGIRM 
+ DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EIGIRM 
KDFFTWNMDGVLKTVEKTTRTLQLFLT LVAVISLWGGIGVMNIMLVSVTERTREIGIRM 

AIGARRGNILQQFLIEXXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAASVIGAVA 
A+GAR ++LQQFLIE F+ + + S +++ A 



45 



55 



60 



Query: 


1 


Sbjct : 


260 


Query: 


61 


Sbjct: 


320 


Query: 


121 


Sbjct: 


380 


Query: 


180 


Sbjct: 


440 


Query: 


240 


Sbjct: 


500 


Query : 


300 


Sbjct: 


560 


Query: 


360 


Sbjct: 


620 



CST GI FG++PA AA+L+P+DALA++ 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-316- 

Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N. meningitidis and N, gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 539>: 



1 


. . GGGACGGGAG 


51 


GGCCACTGGC 


101 


TTTCCTTCCT 


151 


CTGCTCCTTG 


201 


CAGCGGTCAG 


251 


CCGGCTGGGC 


301 


GGCTGGCGCG 


351 


GGTTTGGGCG 


401 


TTTATCTGTC 


4 51 


ACGCGCGCCT 


501 


TATGACCGTC 


551 


AGCTTTTCTG 


601 


ATTTTGA 



10 



15 



This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 

20 1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 

51 LLLGFAGWL LLNPSFRSGQ ETAAL AG LAG GAM SG WAY LK VRELSLAGEP 

101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 

151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 

201 F* 

25 Further work revealed the complete nucleotide sequence <SEQ ID 541 >: 

1 AT GG AT AC CG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

30 201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

2 51 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG- CGGTCAGGAA 

35 4 51 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

40 701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

7 51 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 
801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

8 51 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 
901 TAA 

45 This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 

1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFS FLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

50 2 01 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ E1L GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the following results: 



BNSDOCID: <WO 9924576A2_I_> 



WO 99/24578 



PCT/IB98/01665 



-317- 



Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of K 



meningitidis: 



10 



15 



20 



25 



orf 135. pep 
orfl35a 



orf 135. pep 
orf 135a 



orf 135 . pep 
orf 135a 



orf 135 . pep 



orfl35a 



10 20 30 

GTGAMLLLFYAVTILPLATGVTLSYTSSIF 

I I I I I I I I I I I I I I I I II I I I I I ! I I I I I 
STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

40 50 60 70 80 90 

LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 

j | | i | I | M I I I 1 I M I I ! I I I I I I I It I I I I I I M I II I I I I It M I I I I M I I I I I I I 
LAVFSFLILKERISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 

100 110 120 130 140 150 

VRELS LAGE PGWRWFYLSVTGVAMS SVWATLTGWHTLS FPS AV YLS C I GVSALIAQLSM 
I I M I I I I I II I I I I I t I I I M I M I I I I I I I I I M I I I I I M i I II I I I I I I I I I I I I I 
VRELS LAGE PGWRWFYLSVTGVAMS SVWATLTGWHTLS FPS AVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 

160 170 180 190 200 

TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVFX 
I I I I I I M I I I I I I I I 1 I I I I I I I I M I M I : I I I I I I 1 I I I I I I I I 

TRA YKVG DKFT VAS LS YMT WFS ALS AAFFLAEELFWQE I LGMC 1 1 1 LSG I LS S I RPTAF 
230 240 250 260 270 280 



orf 135a KQRLQSLFRQRX 
290 300 

30 The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 AT GG AT AC CG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CG CCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

2 51 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

4 51 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT - TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

7 51 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 
801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

8 51 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 
901 TAA 

50 This encodes a protein having amino acid sequence <SEQ ID 544>: 

1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

55 201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



35 
40 
45 



orf 135a . pep MDTAiCKD I LG SGWMLVAAAC FT IMNVLIKEASAKFALGSGELVFWRMLFS TVALGAAAVL 
60 ' M II I I M I I I I I I I II M II I I I M I I I I I I I I I I I I I I M I M M I I I I I I I I I I I 1 I 

orf 135-1 MDTAKKDILGSGWMLVAAACFT IMNVLIKEASAKFALGSGELVFWRMLFS TVALGAAAVL 



BNSDOCID: <WO 992 4578 A2_L> 



WO 99/24578 



-318- 



PCT/IB98/01665 



10 



15 



orfl35a.pep 
orf!35-l 
orf 135a. pep 
orfl35-l 
orf 135a. pep 
orfl35-l 
orf!35a.pep 
orf!35-l 



RRDT FRT PHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLS YTSS I FLAVFS FLILKE 

j | | : | | | I | I | | | I I I I I I I I I I M 11 I I I I I I II I I M I ! II II II II II I M I I I I M 
RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

I I I I I I M I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 

RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

I | | | | | | 1 I 1 I I II I I I I M I I M II I M I I I II I I I I I I I I I I I M M I 

WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

VASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 
| | M I I M I I I I I II It I I I: I I I M I I I I I I I I ! I M I I I I I M I I I II I I i I I 1 I I I I 
VASLS YMT WFS ALS AAFFLGEELFWQE I LGMC 1 I ILSG ILS S IRPT AFKQRLQSLFRQR 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 



20 N. gonorrhoeae: 



orfl35.pep 
orf 135ng 
orf 135 .pep 
orf 135ng 
orf 135 . pep 
orf 135ng 
orf 135 . pep 



GTGAMLLLFYAVTXLPLATGVTLS YTSS I F 30 
I I 1 I I II I I I I I i M I : I I I I I II II I I I 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 



25 



30 



35 



LAVFS FLI LKERI S VYTQAVLLLGFAGVVLLLNPS FRSGQETAALAGLAGGAMSGWAYLK 
| | I I I I I I i I I I it I I I 1 I I I I I I I I I I M M I I I I I I I I I I M M M II M M I II I I 
LA V FS FL I LKER I S VYTQAVLLLG FAG WLLLN P S FRSGQE PAALAGLAGGAMS G W AY LK 



90 



395 



VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 
| | I I I I I M I M I I I I I II : I I I I I I I I I I I I M I I I I I I I ! I I I I I I I I I I I II I I I I 
VRE LS LAGE PGWR WFYLS ATG VAMS S VWAT LTGWHTLS FPS AVYLSG I GVS ALI AQLSM 



455 



TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVF 

I (I M M I I 1 II I I I I I I 1 I I I I I I I I M I I M I I II I I I I I I I I M I I : I 
TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCII ISAAF 



201 



506 



orf 135ng 

An ORF135ng nucleotide sequence <SEQ ED 545> was predicted to encode a protein having amino 



acid sequence <SEQ ID 546>: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MPSEKAFRRH 
ILDIQLGLFR 
NLGHFTDTHL 
FRQCGHINRL 
QKQAKTHSTS 
NVLI KEASAK 
NRSMVGTGAM 



LRTASFQGLH 
I DFAALAVYR 
IAQARRFIAD 
APGKDCRNGK 
LAARFTIRPS 
FALGSGELVF 
LLLFYAVTHL 



YTQA VLLLGF AGWLLLNPS 



LAGE PGWR W 
AQLSMTRAYK 
IISAAF* 



FYLSATGVAM 
VGDKFTVASL 



LHHFHQKVGK 
RTQVDFIHTV 
FGNIRPMRRG 
RDKVFFHTRH 
LSQRPFMDTA 
WRMLFSTVTL 
PLTTGVT LSY 
_FRSGQE PAAL 
SSVWATLTGW 
SYMTWFSAL 



CGIIGFGIHI FPTLLPA AQG 

IDGIASDQAF SEWQILRRL 

EAKTFCRCFR FDGIDGIHGD 

YNQVCLEKTN CSARKIKFRH 

KKDILGS GWM LVAAACFTVM 

GAAAVLRRDT FRTPHWKNHL 

TSSIFLAVFS FLILKERISV 



AGLAGGAMSG WAYLKVRELS 
HTLS FPSAVY LSGIGVSALI 
SAAFFLGEEL FWQEILGMCI 



Further work revealed the following gonococcal sequence <SEQ ED 547>: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGGATACCG 
GGCGGCCTGC 
AATTTGCCCT 
ACCGTTACGC 
GCCCCATTGG 
TGCTGCTGCT 
ACCCTGAGTT 
TTTGAAAGAA 
TTGCCGGCGT 
CCGGCGGCAC 
TTTGAAAGTG 
TGTTTTAGCT 
Ctgaccggct 



CAAAAAAAGA 
TTCACCGTTA 
CGGCAGCGGC 
TCGGTGCTGC 
AAAAAC C ACT 
GTTTTACGCG 
ACACCTCGTC 
CGGATTTCCG 
GGTATTGCTG 
TCGCCGGGCT 
CGCGAACTGT 
TTCCGCAACC 
ggCACAcccT 



CATTTTAGGA 
TGAACGTATT 
GAATTGGTCT 
CGCCGTATTG 
TAAACCGCAG 
GTAACGCATC 
GATTTTTttg 
TTTACACGCA 
CTTAATCCCT 
GGCGGGCGGC 
CTTTGGCGGG 
GGCGTGGCGA 
GTCCTTTcca 



TCGGGCTGGA 
GATTAAAGAG 
TTTGGCGCAT 
CGGCGCGACA 
TATGGTCGGG 
TGCCTTTGAC 
GCGGTATTTT 
GGCGGTGCTG 
CGTTCCGCAG 
GCGATGTCCG 
CGAACCCGGC 
TGTCGTCggt 
tcggcagttt 



TGCTGGTGGC 
GCATCGGCAA 
GCTGTTTTCA 
CCTTCCGCAC 
ACGGGGGCGA 
AACCGGCGTT 
CCTTCCTGAT 
CTCCTTGGTT 
CGGTCAGGAA 
GCTGGGCGTA 
TGGCGCGTCG 
ttgggcgacg 
ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttCCtaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

5 851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCC7CTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

10 101 T LSYTSSIFL AVFS FLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VASLSYMTVV 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

15 ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

orf 135ng-l . pep MDTAKKDILG SGWMLVAAAC FTVMNVLIKE AS AKFALGSGELVFWRMLFSTVTLGAAAVL 
I I I | | I M I 1 I I M I I I I I I I I : I I II M II I ! I I I I I M I I I I I I I I I I M : I I II I II 
orf 13 5-1 MDTAKKDILG SGWMLVAAAC FT IMNVLIKE AS AKFALGSGELVFWRMLFSTVALGAAAVL 

20 orf i 35nc-l .pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 

I I I : I I I I I M I II I II I I I I I I I I I I M I M I I I I : II I M I M I I M II M I I I I I I I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSS I FLAVFS FLILKE 

orf 135na-l . oep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 
25 ' ' I ! I 1 I I I I I I I I I I > I I I I I I I I I I I I I I I I I I I I II I M I I M I I I 1 I I M II I I I I I 

orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf l35na-3 . peo WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 
I I I I J I II : I I I I I I M II II I I I II I I II II I I I I I I II I I I I I I I I I I I I II I M I I ' 
30 orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

or f i 35na-l . Deo VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 
I M I I I I II II I I I I I I I I I I I I I I I I I I I M I II I I I II I I I I I I 1111111:11111 
orf 13 5-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

35 Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 66 

The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

40 1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

45 251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

4 01 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

4 51 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

50 501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA T AATAAAT G A CGGAATCGCC 

601 CATCAT^TCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC JcTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

7 01 GAATAG 

55 This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1. MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 
51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

5 1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

10 251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

15 501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

20 This corresponds to the amino acid sequence <SEQ ED 552; ORF136-l>: 

1 MMKRR IAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYII NDGI 

25 201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from strain A ofK 
meningitidis: 

30 10 20 30 40 50 59 

orf 136. pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I II I I II I I I : i I I : I II I 1 I I I I M I I I II I I I I I I I I I I I I I I 11 I I I ! I I I I I 
orf 13 6a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 

35 

60 70 80 90 100 110 119 

orf 136 . pep PCGI VFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
1111111:11111 : I I I M I I I I I : I I I I I I M I t I I M I I I I II I I I M I II MM 
orf 136a PCGI VFGTLLFRHXSTHCL YGKAAVGN A VAHEHPVADVVNRN AN AFALFD IGQFAGFIVQ 

40 70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 136. pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I : : I : I I M t M I I I t I I I I M I I II I I I : : I : I : I : : : : 

45 orf 136a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

130 140 150 160 170 180 

180 190" 200 210 220 230 

orf 136. pep AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

50 : M : I : : : : I I I 1 II M M i I I i I I f I I I I I I I I I I I III 

orf 13 6a . R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF 136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AG AAAAT C CG 

55 51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA . AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCAC.GCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAAAATA AATATCGTCG 

4 01 AT CC AC AT AT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

5 501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

601 CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

10 This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAX FAVLE KR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

15 201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orf 136a. pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
| | | | | | M | | | : ] I [ : I I I t M I I I I 1 I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I 
?0 orf 136-1 MM KRRI AVFVLFPQI I RVLGQLLPKIVNTVP AH RMLFQIFGMFFFFIHQQ YLPGIAEIDS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 136a pep pcGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

25 " * II : I I I I I : i I I I I I I H I : I I II I I I I M I I I I I I I I I M I I I I I I I I I M I 

orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

30 or f 136a . pep HAINVKTVKI NIVDPHMFAN FAX FAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

I : : 1 : I M I I M I I I I ! I I I I I I I i I I M : : I : I : I : : : : 

orf 136-3 HTVN I KT VKI N I V D PHMFAN FAV FAVLEKRD FDHGK I QGGNNAAAFPKKLAPKI FEC FTG 

130 140 150 160 170 180 

35 190 200 210 220 230 

0 r-f ] 36a . DeD R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

||: i : : : : I I I 1 M M I I I I I I M It M I II I I I I 1 I I I I i 

or^l3 6-l AFVGTVYRFVCLFYI I NDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

40 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N. gonorrhoeae: 

or^l36.peo MKRRI AVFVLFPQI IRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 59 

45 ' ' I i I I I 1 I I I I : I II : II I I I I I 11 M I I I I I II I I I I I I I I I I I I : I I I M I M II I 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 60 

0^136. pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 
I ! I || I : I I I I I I I I I M I I I I I I I II 11 I 1 I I I I I : I I I I I I I I M I I I I I MM 
50 orf 136ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

orf 136. pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 

I M I II I II I I I I I I I I M I I II I I I I II I I II I 1 I I I I i I I M I I 1 I I I I I I : I I M I I 
orfl36ng HTVNIKTVKINIVDPHMFAN FAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 180 

orf 136. pep AFVGTVYRFVCLFYI INDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 

i I : I I I I I I M I M I Ml I I I I I : I II I I I I I M I I M I I I M I I I I M II 
orfl36ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

60 1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

• 51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 
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101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

2 01 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 
251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

5 301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

10 551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCC CAT CATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

15 1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAV FAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

20 ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orf 136ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
I I I M I I I I I I : I I 1 : I I I 1 I I I I II I I I I I I I I I I I I I M I I I I I : I I I I I I I i I I I 
orf 13 6-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

25 orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 

I I I I I I : I I I I II I I II M I I I I I I II I I I I I I I I I : I I I M I I I II 11 M MINI 
orf 13 6-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orf 13 6ng HTVNIKTVKI NIVDPHMFAN FAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 

30 I M I I I I I M M 1 I I I I I I I I I I I I I I II I I I It I I I I I I I I I I II I I II I I I : I I M I I 

orf 13 6-1 HTVN IKTVKIN I VDPHMFAN FAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKI FECFTG 

orf 13 6ng AFAGTVYRFVCLFYI INDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 

I I : I I I I M I I M I I I I I I I I I I : II I I II I I I I I I I 1 1 I I I I II I I I M M I I 
35 orf 13 6-1 AFVGTVYRFVCLFYI INDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N, gonorrhoeae \ and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 67 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 557>: 

1 AT G G AAAAT A TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC . TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

45 201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 . TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

3 51 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC AT C AAC CG AA 

4 01 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. 

50 This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN G I PVKWTGT SAGSIVGNLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

2 51 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

7G1 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADWIKPQ VLDLGAVGGF DQKKRAI RLG EEAARAALPE IKRKLAAYRY 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 137 . pep MENMVTFSKI RPLLAIAAAALLAAXRTAGNNAVRKPVQTAK PAAWGLALGGGASKGFAH 
I I I I ! I II I I ! II I I I I I I II I I I ) II I I I : M I I I I I II ! t I II I I I I I I I M I I I I 
orf 137a MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 137 .pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
I I I I I I I I I I I M I ! II I > I I M I I I I : I I I I M I I I I I I I I I M I I I I I II II I I I I : I 
orf 137a VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

orf 137 .pep F I KG AKLQN YIN RK LRGMQ I QQ F P I K FAA 
MM I I ! I I I I I I • i M M I M M I I 
orf 137 a FIKGEKLQNY I NRKVGGRRIQQFP I KFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

3 51 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 
4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 
551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 



WO 99/24578 



-324- 



PCT/IB98/01665 



601 CCCGTCAGTG 

651 TATTTCCGCC 

701 TCGATCAGAC 

7 51 GGGCAGGCGG 
801 CGGCGGATTC 

8 51 CACGTGCCGC 
901 TGA 



CCGCCCGGCG 
CGTCCGAGCA 
GCTGAACGTA 
ATGTGGTTAT 
GAT C AG AAAA 
ATTGCCTGAA 



GCANGNNNNG 
AAAACATCAG 
ATGAGCGTTT 
CAAACCGCAG 
AACGCGCCAT 
ATCAAACGCA 



NATNTCGTGA 
CCAAGGCTTC 
CCGCGTTGCA 
GTTTTGGATT 
CCGGTTGGGT 
AACTGGCGGC 



TTGCCGTCGA 
TTCTCTTATC 
AAATGAGTTG 
TGGGTGCAGT 
GAGGAGGCAG 
ATACCGTTAT 



This encodes a protein having amino acid sequence <SEQ ID 562>: 



10 



15 



1 MENMVTFSKI 
51 
101 



RPLLAIAAAA LLAACGTAGN 



GGGASKGFAH 
LEAEILGKTD 
151 ATDFETGKAV 
201 PVSAARRXXX 
251 GQADWIKPQ 
301 * 



VGIIKVLKEN 
LVDLTLSTSG 
AFNQGNAGQA 
XXVIAVDISA 
VLDLGAVGGF 



GIPVKWTGT 
FIKGEKLQNY 
VRASAAIPNV 
RPSKNISQGF 
DQKKRAIRLG 



NAARKPVQTA 
SAGSIVGSLF 
INRKVGGRRI 
FQPVIIGRHT 
FSYLDQTLNV 
EEAARAALPE 



KPAAWGLAL 
ASGMSPDRLE 
QQFPIKFAAV 
YVDGGLSQPV 
MSVSALQNEL 
IKRKLAAYRY 



ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



20 



25 



30 



35 



orf 137a . pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
I | | | | | | | | I I I I I I I I I I I I I 1 I I I I M I I I : I I I I I I I I I I If I II I I ! I M I I I 1 I I 
orf 137-1 MENMVTFSKI RPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137a. pep VG 1 1 KVLKENG I PVKWTGTSAGS I VGSLFASGMS PDRLELEAE I LGKT DLVDLTLSTSG 
M | I I I I I M M I I M I I 1 I I I I I I I I I M I M I I I I I I I I I I I I I M I I ! I I I I I I I I I 
orf 137-1 VGIIKVLKENGI PVKWTGTSAGS IVGSLFASGMS PDRLELEAE I LGKTDLVDLTLSTSG 

orf 137a . peo FIKGEKLQNY INRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

I 11 I I I I I I 1 I I I I M I I : I I I I I I II M I I I I I II I I I M M I I M I I I M I I I I t I I I 
or f 1 3 7 - 1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAI PNV 

orf 137a. pen FQPVI IGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 
I I M I I I I I II I 1 I I I M I I I I I I 1 I I I M I I I I I II : I I I I I I I I I II I t I I I I 

orf 137-1 FQPVI IGRHTYVDGGLSQPVPVSAARRQGANFVIAVDI SARPGKNISQGFFSYLDQTLNV 

orf 137a . peo -MS VSALQNELGQADW I KPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 
I I I t I I 1 I I I I I I I I I I I I II I I 1 I I I I I I M II I I I I II I I I I I I I I I I I I I I M I I I I 
orf 137-1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

Homology with a predicted ORF from N. gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 



N. gonorrhoeae: 



40 



45 



50 



orf 137 . peD MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
I I S I II I I I I I : I I M I I I I I I I II I I I I : I I I I I I I I I I I M : I M I I I I I I M I I 
o r f 1 3 7 ng MENMVT FSKIRS FLAI AAAALLAACGTAGNNAARKPVQTAKPAA WALALGGGASKG FAH 



60 



60 



orf 137 .pep VGIIKVLKENGI PVKWTGTSAGS I VGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

: M : I M I I I I I It I I I I I 1 I I I I I I 1 : I : I M I I I I I I I 1 M I I I i I I M I t I I M I : I 
orf 137ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 



orf 137 .pep 
orf 137ng 



FIKGAKLQNYINRKLRGMQIQQFPIKFAA 14 9 

INI MINIMI: I i I I I I I M I M 

FIKGEKLQNY I NRKVGGRQ I QQFP I KFAAVATDFETGKAVAFNQGNAGQAVRASAAI PNV 180 



The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 



55 



60 



l 

' 51 
101 
151 
201 
251 
301 
351 
401 
451 



AT.GGAAAATA 
CGCCGCCGCG 
GCAAGCCGGT 
GGTGGCGGCG 
GAAAGAAAAC 
CGATAGTCGG 
TTGGAAGCCG 
CACCAGTGGT 
AAGTCGGCGG 
GCCACTGATT 



TGGTAACGTT 
TTGCTTGCCG 
GCAAACCGCC 
CATCTAAAGG 
GGTATTCCTG 
CAGCCTTTTG 
AGATTTTAGG 
•TTTATCAAAG 
•CAGGCAGATT 
TTGAAACCGG 



TTCAAAAATC 
CCTGCGGTAC 
AAACCCGCCG 
ATTTGCCCAT 
TGAAGGTGGT 
GCATCGGGTA 
TAAAACCGAT 
GCGAAAAGCT 
CAGCAGTTTC 
CAAGGCCGTC 



AGATCATTTT 
GGCGGGAAAC 
CAGTGGTCGC 
ATAGGAATTG 
TACCGGCACA 
TGTCGCCCGA 
TTAGTCGATT 
GCAAAATTAC 
CCATCAAATT 
GCTTTCAATC 



TGGCAATCGC 
AATGCCGCCC 
TTTGGCACTC 
TTAAGGTTTT 
TCGGCAGGTT 
CCGCCTCGAA 
TAACCTTGTC 
AT CAACCGAA 
TGCCGCCGTT 
AAGGGAATGC 
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501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

5 701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

10 This encodes a protein having amino acid sequence <SEQ ED 564>: 

1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGG AS KG FAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

15 201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 

orfl37ng MEN MVT FS K I R S FLA I AAAAL LAAC G T AGNN AARK P V QT AK P AA W ALALG G G AS KG FAH 

20 I I I I I I I I I I I : I I I I I I I I I M I I I I I I I I : II 1 I M II I I I I I : I I I I I M I II I I I 

or f 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
: I I : I I 1 I I I I I I I I I II I I I I I M I I I I : I M I M I I M I I I I I I I I I I I I M I I II I I 
25 orf 137-1 VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137ng FIKGEKLQNY INRKVGGRQI QQFPIKFAAV AT DFETGKAVAFNQGNAGQAVRASAAIPNV 

I I I I I I I M If M I I I I I I I I I I I I I I I I I i I I I I I I I M I I I M I I M II I I I I I I I I I 
orf 137-1 FIKGEKLQNY INRKVGGRQI QQFPIKFAAV AT DFETGKAVAFNQGNAGQAVRASAAIPNV 



30 



orf 137ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
I t I I II I I I I I I I I I I I II I I I I I I M I I I I I I I I I I II I I : 11 :: I I I I I I I I I I M I 
orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 



35 orf 137na MSVSVLQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

I i I I : t I I I I I I I I I II I I I II I I I I I I I I I M I I II I I I 1 I I I I I I I M I I M I I I I I I 
or f 13-7 MSVSALQNELGQADW I KPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
40 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 68 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

45 51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA - 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

50 301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC . . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 

.1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 ■ KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
55 101 MFKAVHGWEH. VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
CCCCGACCCC 
GTTTGGAACT 
ATGTTCAAAG 
ACACGAAGGG 
GCGGACGCTA 
AAACCGCCGA 
TCGCGGCAAA 
TCATCAAAGC 
GTCCCCTCCC 
CAAACCTGCC 
GCGTGAAAAC 
TTCGATTTGC 
CCATGATGCC 
TTCCGACGCA 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
AAAACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTATTCA 
CATCAGCCAG 
AAATCAAAGC 
GGAAAAACCG 
CCTGCGTTCG 
CTCAAGAAGG 
TATACCATGA 
CCTGTTTTTC 
ACATCCGCCC 
GCCGTGTTCA 
GTATCTGTTT 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAGAA 
CTGGGAACAT 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
CGCCTACCAG 
GGCGAAGCAA 
CGGGGAAGGC 
CGCTGGCGGC 
TGCTGCGAAC 
CGTCCAAGGG 
ACCGCAATGC 
ATGTACAACC 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGGCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCCGCTGAC 
ATCATGCAGG 
CATACAAGGG 
CCATCGTCCT 
GTATGGGTGG 
AAAATTGGCA 
GCCTGCCTGG 
GAATTGAACG 
CGAATATTGG 
GCTACAAAAT 



CCGCCATGCA 
CCGCTTTCCT 
TTACCTTTTA 
CGGGTTTGAA 
GCAAAAGGCG 
CATAGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
GTCAAACAAA 
GCCCGACCAC 
ATTTCTTCGG 
CACGTCAAAG 
CGGACAAGGT 
GCGACAAAGC 
ATACGCCGTT 
GCCGTAA 



20 This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 



25 



i 

51 
101 
151 
201 
251 



MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN 



KEDRARIVAN MRQAGLNPDP 
MFKAVHGWEH VQQALDKHEG 
KPPKIKAIDK IMQAGRVRGK 
VPSPQEGGEG VWVDFFGKPA 
FDLKIRPVQG ELNGDKAHDA 



KTVKAVFAET 
LLFITPHIGS 
GKTAPT S I QG 
YTMTLAAKLA 
AVFNRNAEYW 



AKGGLELAPA 
YDLGGRYISQ 
VKQIIKALRS 
HVKGVKTLFF 
IRRFPTQYLF 



RLGHLAFYLL 
FFRKPEDIET 
QLPFPLTAMY 
GEATIVLPDH 
CCERLPGGQG 
MYNRYKMP* 



30 



35 



40 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninzitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF1 38a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 138 . peo MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

I M I I I I I I I I I II I I I t I I M I I I I I I 1 I I I II I I I I II I I I II I I I I I I II I I II I I 
orf 138a MFRLQFRLFP PLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKE DRARIVAN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPED I ETMFKAVHGWEH VQQALDKHEG 
I I I I I I I I I I I M I I I I I I I I I I I M I I I I M I I I II M I I M I I I I I I I I I I I I I I I I I 
orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPED I ETMFKAVHGWEH VQQALDKHEG 

70 80 90 100 110 120 



orf 138. pep LLF 
45 ill 

orf 138a LLFIT PHI GSYDLGGRYISQQLPFPLTAMYKPPK I KAIDKIMQAGRVRGKGKTAPTS I QG 

130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

50 51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

55 301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

60 551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC 

651 CAAACCTGCC 

701 GCGTGAAAAC 

.751 TTCGATTTGC 

801 CCATGATGCC 

8 51 TTCCGACGCA 



CTCAAGAAGG 
TATACCATGA 
CCTGTTTTTC 
ACATCCGCCC 
GCCGTGTTCA 
GTATCTGTTT 



CGGGGAAGGC 
CGCTGGCGGC 
TGCTGCGAAC 
CGTCCAAGGG 
ACCGCAATGC 
ATGTACAACC 



GTATGGGTGG 
AAAATTGGCA 
GCCTGCCTGG 
GAATTGAACG 
CGAATATTGG 
GCTACAAAAT 



ATTTCTTCGG 
CACGTCAAAG 
CGGACAAGGT 
GCGACAAAGC 
ATACGCCGTT 
GCCGTAA 



This encodes a protein having amino acid sequence <SEQ ID 570>: 



10 



1 MFRLQFRLFP 

51 KEDRARIVAN 

101 MFKAVHGWEH 

151 KPPKIKAIDK 

201 VPSPQEGGEG 

251 FDLKIRPVQG 



PLRTAMHILL TALLKCLSLL PLSCLHTLGN 



MRQAGLNPDP 
VQQALDKHEG 
IMQAGRVRGK 
VWVDFFGKPA 
ELNGDKAHDA 



KTVKAVFAET 
LLFITPHIGS 
GKTAPTSIQG 
YTMTLAAKLA 
AVFNRNAEYW 



AKGGLELAPA 
YDLGGRYISQ 
VKQIIKALRS 
HVKGVKTLFF 
IRRFPTQYLF 



RLGHLAFYLL 
FFRKPEDIET 
QLPFPLTAMY 
GEATIVLPDH 
CCERLPGGQG 
MYNRYKMP* 



15 



20 



25 



30 



35 



40 



45 



ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 

or f 138a . pep MFRLQFRLFP PLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARI VAN 

I I I 1 I I I I I 1 I I M I I II ! (I I II I I IN I 1 I I M I 1 I I I I I I I I I I I M II II I 

or f 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138a . pep MRQ AGMN P D PKT VKAV FAET AKGG LE LAP AF FRK PE D I E TMFKAVHGWE H VQQALDKHEG 
I I I I I : I I II I 1 I I II I II I I II I I I I ! I 11 I I I I M I I I II I I II I I M I 1 I I I I I I I I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I M I I I I I I I I I I M t I I II I I M I I 
orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I I I II I I I I I I I I 1 I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I M I I M I I 
orf 138-1 VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

or -fl 38a. pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
i 11 I I I I I I I I I I I I I U I II II I II I I M I I I I I I I I M I I I I I I I I I I I I II M I I 
orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

Homology with a predicted ORF from N. gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 
N. gonorrhoeae: 

orf 138. pep MFRLQFRLr PPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 

I | | I | I | | I I I I I I I I I I I I I I I II II I I I I I I I I I I I II I I I I I I i I I I I I I I I It I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 60 



orf 138 . pep MRQAGLNPD PKT VKAVFAETAKGGLELAPAFFRKPED I ETMFKAVHGWEH VQQALDKHEG 
I M I I I I I I : 1 t I II I I 1 I i i II I I I I I I 1 : I M I I I I I I I I ! M I M I I I M I I I I 
orf 138ng MR QAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDI ETMFKAVHGWEH VQQALDKGEG 



LLF 
i I I 



orf 138 . pep 

orf 138ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

The complete length ORF138ng nucleotide sequence <SEQ ID 571> is: 



120 
120 
123 
180 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
CCCCGACACG 
GTTTGGAACT 
ATGTTCAAAG 
GGGCGAAGGG 
GCGGACGCTA 
AAGCCGCCGA 
GCGCGGCAAA 
tcatcaAGGC 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
CAGACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTGTTCA 
CAT C AG C C AG 
AAATCAAAGC 
GGCAAAACcg 
CCTGCGCGCG 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAAAA 
CTGGGAACAC 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
cgcccaccgg 
GGCGAGGCAA 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGGCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCACCTGAC 
ATCATGCAGG 
catACAAGGG 
CCAtcATCCT 



CCGCCATGCA 
TCGCTTTCCT 
TTACCTTTTA 
CGGGTTTGAA 
GCAAAATGCG 
CATCGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
G T C AAAC AAA 
GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

5 801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

8 51 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

10 101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

15 or f 138-1 .pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I 1 | I M I I I I I I I I I I I I I I I I I I I I I I M I t I M I I I M i I I I I I I I I ! I M I I I I I I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
20 I I I 1 I I I I I : M I I I I I I I I I I I I I I I I I I : M I M I I I I I I I I II I I M I 1! I I II 

orf 138ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1 . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I M I I I I I I I I I I I I I I I M I II I I I! I I I I I I 1 I I I M I M I II I I I I I I I I I I : I I I 
25 orf 13 8ng LLFITPKIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 



30 



orf 138-1. DeD VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I M I I I M : I II M : I I I I I I I I II I I I II : II I I I II 1 I I I I I I I I M II I I I I M I 
orf 138ng VKQI IKALRAGEATI I LPDHVPS PQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 

orf 138-1 .peo CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I I I I I I I I I I I ! I I I I M I 11 : I I I I II I I M I : I I M M I I I I I I I I I I II I I 
orf 138ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWI RRFPTQYLFM YNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

35 gnl I PID I e334283 (Y14568) htrB [Pseudomonas fluorescens] Length = 253 

Score = 80.8 bits (196), Expect - 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps - 6/151 (3%) 

Query: 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
40 + + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 

Sbjct : 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI 1 FYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQI IKALRAGEATI ILPDHVPSPQEGGGVWADFFGKPA 219 
++++ RV+ K A + +G+ +IK +R G I D P P E G++ FF A 

45 Sbjct: 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD— PEPAESAGIFVPFFATQA 208 

Qu.^ry: 220 YTMTLAAKLAH VKGVKTLFFC CERLPDGQGF 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 

50 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF 13 8-1 (57kDa) was cloned in the pGex vectors and expressed in £.co//, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
55 shows the results of affinity purification of the GST- fusion protein. Purified GST- fusion protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 

Example 69 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 573>: 

5 1 GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 
151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 
201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

10 251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 
351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 
4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 
4 51 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

15 501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG . . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 

1 . . AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAA PARR SAW 
51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAY P FVAKDV 
20 101 LSAW DALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVX. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

25 101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

30 351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

4 01 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

35 601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

7 51 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

40 851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

45 1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

50 1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

14 01 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

55 1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVLSVCCLFP LLAIW KAW5 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL YPQWTAS LPL LL AMY ALLAY PFVA KDVLSA 

5 401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA M'LTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
10 ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of N. 
meningitidis: 

10 20 30 

orfl39.pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
15 orf 139a QSVGEYVLLAF AAAVXSVCCLFXLLAIW KAWS AGESWRVLME SET WQAVWNTXRFS AAA 

270 280 290 300 310 320 

40 50 60 70 80 90 

or f 13 9 . peD VYAAAVLGWY7^AP ARRSAWMRGLM FXPFM\^SPVCVSAGVLLL YPQWTAS LPLLLAMYAL 
20 I I II I I I I I II I I I I I I I I I I I I II I I M I I I ! I I II I M I I I I I I II I M I I I II I 

o r f 1 3 9a VYAAAVLG WYAAA ARRS AWMRGLM FL P FMVS PVCVS AG VLLL X PQWT AS LPLLLAMYAL 

330 340 350 360 370 380 

100 110 120 130 140 150 

25 orf 139. oep LAY?rVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

I I II I M I t I I I I I M I M I I I I I I I I I M I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I 
orfl39a LAYPrVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 

30 160 170 180 189 

orf 139. Dep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 
I I I I II I I II i I I M I M M I I I I I I I I I I II I I I 
orf 13 9a GEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARA MVLTLLLAAFALGXFLLL DGGEGG 
450 460 470 480 490 500 

35 The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 

1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

40 201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

3 51 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

45 4 51 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 

50 7 01 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

8 01 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

55 951 GTGGAATACT NTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT NATCCGCAGT GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

60 1201 TGNGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCNTGTC GCGTCNCGAG TGGCAGACGC TGACGACTTT 
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14 01 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 

14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

5 i MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 R T.AWTVFOAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FGA DGLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 

151 VPAARLQTAX TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVXGVTA AAGLL YAWFG 

JO 251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVXSVCCLFX LLAIW KAWS 

301 AGESWRVLME SETWQAVWNT XRFS AAAVYA AAVLGVVYAA A ARRSAWMRG 

351 LMF LPFMVSP VCV5AGVLLL XPQWTAS LPL L LAM Y ALLAY PFVA KDVLSA 

4 01 XDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

4 51 AATLFXSRXE WQTLTTLIYA YXGRAGXDNY ARAM VLTLLL AAFALGXFLL 

15 501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 



20 



orf 139a . pep MDGRRWAVWGAFALLPSAFLAAMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

I | | | 1 I : ! I I I I I I I I I I I M : I I M I I I I I I I I M I i I I I I I ! I I I I I I I I M 

orf 139-1 MDGRRWVVWGAFALLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

or ^13 9a Dep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 

M I M I I t I I I I I I I I I I i I i i I I I t I I I 1 I I M I I I I I I M I I II I I I I I I I 

orf 3 39-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 



25 orf t 39a . pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 

MM! II I I I II I I 11 I I II I II M I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 

30 I I I II I I I M I I I I I ill M I I I I I I I I I I I M I I i I I I I M M I I I I II I I M I I I I I 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

o^f 139a . oep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 
I | M I M I I M II M I I I I I M I I I I I I II I I I I M I I I I I I I I I I I I I I II I I M II 
35 orfl39-l AAGLLYAW FGRRAVS DKAVS PVMPS PPQSVGE YVLLAFAAAVLSVCCLFPLLAI WKAWS 

orf 39a . pep AGE SWRVLMESETWQAVWNTXRFSAAAVYAAAVLGWYAAAARRSAWMRG LMFLPFMVSP 
| M | I M I I I I I I I 1 ! I I I I I I I I I I I I I I I I I I I II I II I II I I M I I II I I I I I I II 
orf 139-1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 



40 



orf i 39a pep VCVS AGVLLLXPQWT AS LPLLLAMY ALLAY PFVAKDVLSAXDALPPDYGRAAAGLGANGF 
I I I I I t I II I I I I I I I I I 1 III II 1 M I I I I I M I I I I I I I II I I M I II I I I II I I I 
orf 13 9-1 V CVS AGVLLLYPQWTAS LPLLLAMY ALLAY PFVAKDVLS AW DALPPDYGRAAAGLGANGF 



45 o^f 13 9a . peo QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYA YXGRAGXDNY 

| | | M I | I I I I I I I I I II I I I I I II I II I I I I I I I M I I I I I I I M I I I I I I I Ml 
orf 13 9-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
50 I M I ! I M I I I I I M I I I I M I I I I I I : I I M I ■ 

orf!39-l ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
55 N. gonorrhoeae: 

orf 139 pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

I I I I ! I I I I M I I I I I : I I I I M 1 I I I M 
orf!39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 

60 orf 139 . pep VYAAAVLGWYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

| : | | | M I I II I I Hi : M I I I: I • I I I I I -I II I II I I I I I I I I I I I I I I I 

orf!39ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 
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orf 139 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

I I I I I II I I I ! I I I I I I I I I I I II I I I M I I I I I I I M I I I I II II I I I I I I I II f I I I I 
orf 139ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 4 47 

orf 139 .pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 18 9 

I I I I I I I I I I ! I I II II II I I I I I I I t I I I I I I I I I I I I 
orf 13 9ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORP139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 



1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQG FAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGVVYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMY ALLAY PFVAKDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ED 581 >: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGGATGGAC 
GGCTTTTTTG 
ATGACGGTTT 
CGTTTGGCGT 
GCCTTTGGGC 
GGCGGGCTTT 
CTGGTGGCGG 
GTGGCGCGGC 
TTTTCAACCT 
GTGCCTGCGG 
GCGGCGGTTT 
GCGGCGTGTG 
TTGCTGTTGG 
GTTGGTTATG 
TGGTGTTGGG 
AGGCGCGCGG 
GCAATCGGTG 
CCGTGTGCTG 
GCCGGCGAAT 
GTGGAATACt 
TGGGTGTGGT 
CTGGTGTTTT 
GCTGCTGCTT 
TGTATGCGCT 
TGGGATGCAC 
AAACGGCTTT 
CGTTGCGGCG 
GCGGCAACCT 
GATTTATGCC 
TGGTGTTGAC 
TTGGACAACG 



GGTGTTGGGC 
GCGGTAATGG 
GGCGTGGCGC 
GGACGGTGTT 
GTGCCTGTCG 
GGTGCTGCGC 
GCGTGGGCGT 
CGGCAGGATA 
GCCCGTGTTG 
CACGGCTTCA 
TGGGACATTG 
CCTTGTCTTC 
GCGGCAGCCG 
TTCGAACTCG 
GGTAACGGCG 
TTTCGGATAA 
GGGGAATATG 
CCTGTTTCCT 
CGCGGCGTGT 
rtGCGCTTTT 
GTATGCGGCG 
TACCGTTTAT 
TATCCGGGGT 
GCTGGCGTAT 
TGCCGCCGGA 
CAGACGGCAT 
CGGTCTGACT 
TGTTCCTGTC 
TATTTGGGGC 
ATTGCTGTTG 
GCGAAGGCGg 



GGTACGGGGT 
TCGTTGCGCC 
GCGGTGCTGT 
TCAGGCGGCG 
CGTGGGTGCT 
CTGCTGATGC 
GCTGGCTCTG 
CGCCGTATCT 
GTCAGGGCGG 
GACGGCACGG 
AAATGCCCGT 
CTGTATTGTT 
TTATGCCACG 
ATATGGCGGG 
GCGGCAGGGT 
GGCGGTTTCC 
TATTGCTGGC 
TTGTCGGCAA 
GTTAATGGAA 
CGGCGGCGGC 
GCGGCGCGGC 
GGTGTCGCCG 
GGACGGCTTC 
CCGTTTGTGG 
TTACGGCAGG 
GCCGTATCAC 
TTGGCGGCGG 
GCGTCCGGAA 
GTGCGGGTGA 
TCGGCATTTG 
aaaACGGACG 



GCTTTTTCCC 
TTTGTGGGCG 
CGGATGCCTA 
GCAACCTGTG 
GGCGCGGCTG 
TGCCGTTTGT 
TTCGGGGCGG 
GTTGTTGTAC 
CGTATCAGGG 
ACGTTGGGCG 
TTTGCGCCCG 
TTTCGGGGTT 
GTCGAAGTGG 
GGCTTCGGCG 
TGCTGTATGC 
CCCGTGATGC 
ATTTTCGGTG 
TTGTTGTGAA 
AGTGAAACGT 
GGTGTTTGCG 
GGCTGGTGTG 
GTTTGTGTTT 
GTTACCGCTG 
CAAAAGATGT 
GCGGCGGCAG 
GTTCCCCCTC 
CGACGTGTGT 
TGGCAGACGT 
GGACAATTAT 
CGGTGTGCAT 
GAAACGTTAT 



TGCTGCCTTC 
GTGGCGGCGT 
TATGCTCAAA 
TGCTGGTGCT 
GCGTTCCCGG 
GATGCCCACG 
ACGGGCTGTT 
GGCAATGTGT 
GTTTGCTCAA 
CGGGGGCGTG 
TGGCTTGCCG 
CGGGCTGGCA 
AAATTTACCA 
CTGGTGTGGC 
GTGGTTCGGC 
CGTCGCCGCC 
GCGGTGTTGT 
AGCGTGGTCG 
GGCAGGCAGT 
GCGGCGGTTT 
GATGCGCGGA 
CGGCGGGCGT 
CTGCTGGCGA 
TTTATCGGCC 
GTTTGGGCGC 
TTGAAACCGG 
GGGCGAATTT 
TGACGACTTT 
GCGCGGGCAA 
TTTCCTGCTG 
AA 



This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 



RLAWTVFQAA 
LVAGVGVLAL 
VPAARLQTAR 
LLLGGSRYAT 
RRAVSDKAVS 
AGESRRVLME 
LVFLPFMVSP 



ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 



FGADGLLWRG 
TLGAGAWRKF 
VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 



WDALPPDYGR 
AATLFLSRPE 
LDNGEGGKRT 



AAAGLGANGF 
WQTLTTLIYA 
ETL* 



RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

LRFS AAAVFA AAVLGWYAA A ARRLVWMRG 

Y PGWT AS L PL L L AM Y ALLAY PFVA KDVLSA 

QTACRITFPL LKPALRRGLT LAAATCVGEF 

YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 



WO 99/24578 



-333- 



PCT/IB98/01665 



10 



15 



20 



25 



30 



35 



ORF139ng-l and ORF139-1 show 95.9% identity over 51 3aa overlap: 

or *1 3 9ng MDGRCWAVRGAFSLLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
| | | | | : j | | I : M I I I it I I I I I I I I I M I I 1 i I I I 1 I I t I I I i i II I! M I i I I II 1 
orf 139-1 MDGRRWWWGAFALL PSAFLAVMWAPLWAVAAYDGLAWRAVLS DAYMLKRLAWTVFQAA 

or f 1 3 9ng ATCVLVLPLGVPVAWVIARIAFPGRALVLRLI^LPFVMPTLVAGVGVIALFGADGLLWRG 
I I I I I I M I I I I M M I I I I 1 ! I I I I I I I M I I I I 1 I M 1 I M i I I M I M ! I I I i I I M 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
I I I I I I I I I M I I M I I I I I I I I I I I I I : I M II I I I II I I M I I I I I II M I I I I I I I I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

o^£13 9ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATTOVEIYQLVMFELDMAGASALVWLVLGVTA 
t I I I I I M I i I I I M I I i I t I I I I I I I I I I I I I I I I I I M I I I I I I I I : II I M I I ! I I 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 
I I I I M I I II I I I I I I I I 1 I I M I I I M I I I I I I I I I I : : I I I I I I I M I I I I I I I II I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 13 9ng AGESRRVLMESETWQAVWNTLRFSAAAVFAAAVLGVVYAAAARRLVWMRGLVFLPFMVSP 
I I I I I I I I I I I I II I 11 I I I M I I I M : I I I I I I I II M I I I I : I i I I I : I I I II I I I 
orf 139 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

o*"fl39ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
MINIMUM M M I I I I I I I 1 I I I I I M III I I I I I I I I I I I I I I I I I I I I I I II I 
orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf!3 9ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 
M I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I M I I I I t I I I I I M I M ! I I I I I 
orf!3 9-l QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 



ARAMVLTLLLSAFAVCIFLLLDNGEGGKRTETL 
I I I I 1 I I I I I : M I : ! I I I II : I I I I I : I I I I 
ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETL 



40 



orf 139ng 
orfl39-l 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae , and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 70 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

SI GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

45 101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C... 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 

50 1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

55 101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA- AAAACTTCGG 



BNSDOCID: <WO 992457BA2_I_> 



WO 99/24578 



-334- 



PCT/IB98/01665 



10 



15 



20 



201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCGGCGTGG 
CGAAACAT CC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGTCT 
GTCAACCAAA 
GTTCGCCATC 



CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
AT C AG CG AAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



25 This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 
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35 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGWTQTLSA QTLLGI5AAA IILILILIVK FRIHALLTLV IVSLLTALAT 



GLPTGSIVND ILVKNFGGTL 
IRMFGEKRAP FALGVAS LIF 
FALASIGAFS VMHV FLPPHP 
SGYMLGKVLG RTIHVPVPEL 
IFLNTGVSAL ISEKLVSADE 
RGESGSALEK TVDGALAPVC 
DLG I PVLLGC FLVALALRIA 
CIVLATAAGS VGCSHFNDSG 
FALSALLFAI V* 



GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

GPIAASEFYG ANIGQVLILG LPTAFITWYF 

LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

TWVQTAKIIG S TP1ALLISV LVALFVLG RK 

SVILITGAGG MFGGVL RASG I GKALADSMA 

QG SAT VALTT AAALMA PA V A AA G FT DWQLA 

FWLVGRLLDM DVPTTLKTWT VNQTLIALIG 



40 



45 



50 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 1 4 0 . pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 
I I I I I I M I I I M I I I M I I t I I I I I I I I : I I I I I I I I I I I I II I I I I M I II I I I I I : I 
Orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 

70 80 
or f 14 0 . pep I LVKN FGGTL GGVALLVGLGAMLERLV 
: ! I I I I I I I I I II I I I I I I I I I I III 
orfl4 0a VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGGACGGCT 
GGCGGCGGCA 
ACGCGCTGCT 
GGTTTGCCCA 
CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 



GGACACAGAC 
ATCATCCTCA 
GACACTGGTC 
CAGGCAGCAT 
GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 



GCTGTCCGCG 
TTCTGATTTT 
ATCGTCAGCC 
TGTCAACGAC 
CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 



CAAACCCTGT 
AATCGTCAAA 
TGCTGACGGC 
GTACTGGTCA 
CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 



TGGGCATTTC 
TTCCGCATCC 
TTTGGCAACC 
AAAACTTCGG 
GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-335- 



PCT/IB98/01665 



501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

5 701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGG C AAA AATAAT'CGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

\0 951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

15 1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ED 588>: 

20 1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKE PA KAGTV VAIMLI PMLL 

25 251 I FL NTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG I PVLLGC FLVALALRIA QG S AT V ALT T AAA LMA P A V A AA GFTDWQLA 

4 01 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

4 51 FALSALLFAI V * 

30 ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 

Orfl4 0-T Dep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND. 60 

i I I I I i I I I I I I I I M II I I I I I I I I I M I I I I M I I I 1 I I I I I I I II I II I I I I I I I M 
orf 14 0a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

35 orf 140-1 .pep ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

: M | I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 1 I I I I I I I I M I I I 11 I I II M I 
orf 14 0a VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

orf 1 40-1. Deo GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 
40 I I I I I M I I I I I I H I I M I I I 1 I 11 I I I I II I I I I I It I M I I I I M I I I I I I I I I I U 

orf 140a G FP I FFDAGLIVMLP I VFAT ARRMKQDVLP FALASIGAFSVMHVFLPPHPG PI AASE FY G 810 

orf 140-1 .Deo ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKE PAKAGTV 240 
i I It I M I I I I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I i I I I 
45 orf 140a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 24 0 

orf 140-1 .pep VAIMLI PMLLI FLNTGVSALISEKLVSADETWVQTAKI IGSTPIALLISVLVALFVLGRK 300 

I I I I I M M I I I I I I I I I [ I I I I I I I I II I I i I I I I 1 I I 1 11 II I II II I I I I I I I I I I I 
orf 14 0a VAIMLI PMLLI FLNTGVSALISEKLVSADETWVQTAKI IGSTPIALLISVLVALFVLGRK 300 
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orf 140-1 .pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLG I PVLLGC 360 

11 I M I I I I I I I I M I M I I I I I I I I II I I I I M M I I I I I I I M I M I II I I I ! I I I I I 
orf 14 0a RGESGSALEKTVDGALAPVCSVI LITGAGGMFGGVLRASGIGKALADSMADLGI PVLLGC 360 



55 orf 140-1 .Dep FLVALALRIAQGS AT VALTTAAALMAPAVAAAGFTDWQLAC I VLATAAGS VGCSHFNDSG 420 

I I II I I I I 1 I I M I I I I i I I I I I I I I M I I I I II I I I I I I I I I I I M I I I I I I M I II I I 
orf 140a FLVALALRIAQGSATVALTTAAALMAPAVAAAG FT DWQLAC I VLATAAGS VGCSHFNDSG 4 20 

orf 14 0-1 .Dep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 
60 * 1 I I M I i I I 1 I I I I I I I I I I I I I I I I M I M I I I I I I I I M 

orf 14 0a FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 

Homology with a predicted ORF from N. gonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
65 N. gonorrhoeae: 



BNSDOCID: <WO 9924578A2J_> 
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orfl40.pep 
orf 140ng 
orfl40.pep 
orf 140ng 



MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 
Mi M I I I I It I I f II I I t I t I M I I I! : M t : I I I M I I : I H I M I I t M M I M : I 

MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

ILVKNFGGTLGGVALLVGLGAMLERLV 87 
: I I I I I I I I I ! I I I I I II I I I M III 

VLVKN FGGTLGGVALLVGLGAMLGRLVETSGGAQS LADAL I RMFGEKRAPFAPGVAS LI F 120 



The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MDGRTOTLSA OTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

GLPTGSIVND VLVKN FGGTL GGVALLVGLG AMLGRLV ETS GGAQS LADAL 

IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

I FL NTGVS AL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

DLG IPVLLGC FLVALALRIA QG S AT V ALT T AAALMAPAVA AA GFTDWQLA 

CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 
FALSALLFAIV* 



20 Further work revealed a variant gonococcal DNA sequence <SEQ ID 59 1>: 



25 



30 



35 



40 



45 



x 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGGACGGCC 
GGCGGCGGCA 
GCGCGCTGCT 
GGTTTGCCCA 
CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAGGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGCCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGACACAGAC 
ATCATCCTCA 
GACACTGGTC 
CAGGCAGCAT 
GGCGGCGTGG 
AGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
ATTCGCCACC 
CCTCCGTCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACTTGGGTTC 
GATTTCCGTA 
GCGGCAGCAC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



GCTGTCCGCG 
TTCTGATTTT 
ATCGCCAGCC 
CGTCAACGAC 
CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCGTCA 
ATCAGCCCTC 
AGACGGCAAA 
TTGGCCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGC 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGCCT 
GTCAACCAAA 
GTTTGCCATC 



CAAACCTTGT 
AATCGTCAAA 
TGCTGACGGC 
GTACTGGTCA 
CGGTCTGGGC 
AGTCGCTGGC 
TTCGCTCCGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCGCCATCC 
CAGCGACCCG 
TGCTGATTCC 
ATCAGCGAAA 
AATGATCGGT 
TGTTGGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACA 
TTACCGACTG 
GTCGGTTGCA 
CTTGGATATG 
CCCTCATCGC 
GTCTGA 



TGGGCATTTC 
TTCCGCATCC 
TTTGGCAACC 
AAAACTTCGG 
GCAATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCTG 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ATTCATCGGC 



This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l: 



50 



55 



MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 



1 

51 GLPTGSIVND VLVKN FGGTL 

101 IRMFGEKRAP FAPGVAS LIF 

151 FALASVGAFS VMHV FLPPHP 

201 SGYMLGKVLG RAIHVPVPEL 

251 I FLNTGVSAL ISEKLVSADE 

301 RGESGSTLEK TVDGALAPAC 

351 DLG IPVLLGC FLVALALRIA 

4 01 CIVLATAAGS VGCSHFNDSG 

4 51 FALSALLFAIV* 



GGVALLVGLG AMLGRLV ETS GGAQS LADAL 

GFPIFFDAGL IVML PIVFAT ARRMKQD VLP 

GPIAASEFYG ANIGQVLILG LPTAFITWYF. 

LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

TWVQTAKMIG S TPVALLISV LAALLVLG RK 

SVILITGAGG MFGGVL RASG IGKALADSMA 

QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

FWLVGRLLDM DVPTTLKTWT VNQTLIAFIG 



60 ORP140ng-l and ORF140-1 show 96.3% identity over 461 aa overlap: 

orf 14 0ng-l .pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
Ml M I I I M M I II 1 II I M I M II I I I I I I : I I M I I I : I I I I I I I I I I I I I I I 1 I I 
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60 
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orf 140-1 MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

orf 140ng-l . pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
: | I I I M M I I I 1 I I I I I I M I [ 1 M I I I I I I I I I I I I I M I I I I M I M I I I I I I I I I 
orf 14 0-1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

orf!40ng-l pep G FP I FFDAGLI VML P I VFATARRMKQDVL PFALAS VGAFS VMH VFL PPHPG P I AASE FYG 

| | | | I i | | | M M M I I I I I I I I I I I I I I I : I I M I I I I I I I II I I I I I I I I I I I 

orf 14 0-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orfl40ng-l.pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKE PAKAGTV 
I M M I I I I I I I I I I! I I I I M M i I I I I I i : I I I M I I I I I I I I II : I I ! I I I M I I I 
orf 14 0-1 AN IGQVLILGLPTAFITWYFSGYMLGKVLGRT I HVPVPELLSGGTQDNDLPKE PAKAGTV 



15 orf 140ng-l .pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWV.QTAKMXGSTPVALLISVLAALLVLGRK 

I i : I I I I I M I I I I I I I I 1 I I I I I I I I I I I I I I i I M : I i I M : M I I M I : M : M I I I 
or514 0-l VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

orf 14 0ng-l.pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 
20 | 1 I I I I : I I I I I M I I I I : II I I! I I 1 I I I I I M I I M I I I I II II I I I M I I I I I M I I 

orf 14 0-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

orf 14 0ng-l.pep FLVALALR I AQG S AT VALTT AAALMAP A VAAAG FT DW QLAC I VLAT AAG S VG C S H FN D S G 
M II I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I M I I I I I II I I I I I I I I I I I M M 
25 orf 14 0-1 FLVALALR I AQG S AT VALTT AAALMAP A VAAAG FT DW QLAC I VLAT AAG S VG C S H FN D S G 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 
I I I I I I II I I II I I i I I II I I I I I I I I : I I I I I I M I I I I I 
orf 14 0-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF1 40ng- 1 is homologous to an E.coli protein: 

gi[882633 (U29579) ORF_o454 [Escherichia coli] >gi|1789097 (AE000358) o454; 
This 454 aa ORF is 34% identical <9 gaps) to 444 residues of an approx. 456 aa 
protein ■ GNTP_BACLI SW: P46832 [Escherichia coli] Length ^ 454 — 
Score = 210 bits (529), Expect = le-53 . 
35 Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 

Query: 88 ET S GG AQ S LADAL I RM FG E KRA P FA PGV AS L I FGFP I FFDAGLI VMLPIVFATARRMKQD 14 7 

E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI + + A+ K 
Sbjct: 80 EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 

Query: 148 VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYMLGK 207 

L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 

Sbjct: 140 PLKFGLPVAGIMLTVHVAVPPHPGPVAAAGLLHADIGWLTIIGIAIS-IPVGWGYFAAK 198 

45 Query: -208 VLGRAIHVPVPELL SGGTQDSDPPKEPAKAGTWAVMLIPMLLIFLNTGV 257 

++ + + E+L G T+ SD P A V ++++IP+ +1 T 

Sbjct: 199 IINKRQYAMSVEVLEQMQLAPASEEGATKLSDKINPPGVA-LVTSLIVIPIAIIMAGT— 255 

Query: 258 SALISEKLVSADETWVQTAKMIGSTPXXXXXXXXXXXXXXGRKRGESGSTLEKTVDGALA 317 
50 +S L+ + T ++IGS +RG S + AL 

Sbjct: 256 VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 

Query: 318 PACS VI LITGAGGMFGGVLRASGI GKALADSMADLG I PVLLGC FLVALALR I AQG SXXXX 377 
A VIL+TGAGG+FG VL SG-rGKALA+ + + +P+L F+++LALR +QGS 
55 Sbjct: 313 TAAWILVTGAGGVFGKVLVESGVGKALANMLQMIDLPLLPAAFIISLALRASQGS — AT 370 

Query: 378 XXXXXXXXXXXXXXXGFTDWQLACIVLATAAGSVGCSHFNDSGFWLVGRLLDMDVPTTLK 437 

G Q + LA G +G SH NDSGFW+V + L + V LK 
Sbjct: 371 VAILTTGGLLSEAVMGLNPIQCVLVTLAAC FGGLGASHINDSGFWIVTKYLGLSVADGLK 4 30 



Query: 4 38 TWTVNQTLIAFIGFALSALLFAIV 461 

TWTV T++ F GF ++ ++A++ 
Sbjct: 431 TWTVLTT I LGFTG FL I TWCWJAV I 4 54 



Based on this analysis, including the identification of the presence of a putative leader sequence 
65 (double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae ; and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: 

1 . . GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 

51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 

101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 

151 AACTTTTTGG GCAGACACCA CGGGCGCAC . G7CGTCCTGA TTCTCATCGG 

201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 

2 51 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

301 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 

351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

4 01 TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 



1 . .DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 

101 RVIAA5FLLG TGWTLMSLAA AY P AAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

2 51 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

7 01 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

7 51 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 
801 CAGTTTGTTT TACT AT CTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

8 51 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 
901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 
951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDE PAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGJPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG • LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAA FAAAG L V LHGYSLARRR VIAASFLLGT GWTLMSLAAA YP AAFALML P 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 
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251 
301 
351 
401 
451 
501 
551 



YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLK TRIVWTQYGT 

LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
ENI* 



10 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of N. 



15 



20 



25 



30 



meningitidis: 



orf 141 . pep 
orf 141a 



orf 14 1 . pep 



orf 141a 



10 20 30 

DFG I S PVYLWVAAAFKHLLS PWAADS YDVA 
i I ! 1 I M I I I I M ! I I III I 1 I I I I 1 : I 
WN PDE PAVYTAVEALAGS PT PLVAHLFGQI DFG I PPVYLWVAAAFKHLLS PWAADPYDAA 
40 50 60 70 80 90 

40 50 60 70 80 90 

R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 
I I I I I I I I i : I I I M I I I I I I I I I I I I M I I I I I I M M I I I :: I I I 11 II I I I I I I I I 
R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 110 120 130 140 150 



100 110 120 130 140 

or f 1 4 1 . D6D VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RP 
I | j 1 II I I I I I I ! I I I I I i I I I II I I I I I I M I I 1 I M I M M I I I I I I I 
orf 14 la VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RPWQSRRL MLTA 
160 170 180 190 200 210 

o r f 1 4 1 a VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGT FGGVRH I QTAFS LFY YLKNLLW F 

220 230 240 250 260 270 

The complete length ORF1 41a nucleotide sequence <SEQ ID 597> is: 



35 



40 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGCTGACCT 
AAAGCCGTGG 
TGTTTTCCCA 
GTCGAAGCAC 
TCAAATCGAT 
TCAAACATTT 
TTTGCCGGCG 
CGGTTTCAAC 
TCATCGGCTG 
GCCGCCGCCT 
TCGCCGGCGC 
TGATGTCGTT 
CTGCCCGTGC 
GACGGCAGTC 
CGCTGCTCTT 
GATCACGTTT 
CAGTTTGTTT 
TGCCGCTGGC 
TGGGGGATTT 
CGTCAATCCG 
TTGCCCTGTT 
GCGTTTGTCA 
CCTGTGGACG 
CCGAACGCGC 
ATTCCGATGG 
TACCCGCAAA 
GCGTTACCCT 
. GACGCGGCGA 
TTCCCCGGAA 
TAGGCGGCGG 
TTGCCGCACC 
GCCCCAAAAC 



ATACCCCGCC 
CTGTTGCTGT 
CGATTTGTGG 
TGGCAGGCAG 
TTCGGCATAC 
GCTGTCGCCG 
TGTTTTTCGC 
TTTTTGGGCA 
TATCGGGCTG 
TTGCGGCCGC 
GTGATTGCCG 
GGCAGCAGCT 
TGATGTTTTT 
GCCTCGCTTG 
GGCAAAAACG 
TCGGTACGTT 
TACTATCTGA 
GGTTTGGACG 
TGGGCGTCGT 
CAGCGTTTTC 
CGGCGCGGCG 
ACTGGTTCGG 
GGCTTTTTCG 
CGCCTATTTC 
CGGTTGCCGT 
AACATACGCG 
GACCTGGGCT 
AAAGCCACGC 
TTAAAACGGG 
CGACCTACAC 
GCGTCGGCGA 
GCGGATGCGC 



CGATGCCCGC 
TGATGGCGTT 
AATCCTGACG 
CCCCACCCCT 
CGCCCGTGTA 
TGGGCTGCCG 
CGTTGTCGGA 
GACACCACGG 
ATTCCGACCG 
CGGACTGGTG 
CCTCTTTTCT 
TATCCGGCGG 
CCGTCCGTGG 
CCTTTGCCCT 
CAGCCCGCGC 
CGGCGGCGTG 
AAAACCTGCT 
GTTTGCCGCA 
CTGGATGCTT 
AGGATAACCT 
CAACTGGACA 
CATTATGGCG 
CCATGAATTA 
AGCCCGTATT 
ACTGTTCACA 
GCAGGCAGGC 
TTGCTGATGA 
GCCCGTCGTC 
AGCTTTCAGA 
ACGCGGATTG 
TGTACAATGC 
CGCAAGGCTG 



CCGCCCGCCA 
TGCCTGGTTG 
AACCTGCCGT 
TTGGTTGCCC 
TCTTTGGGTT 
ACCCGTATGA 
CTGACTTCCT 
GCGCAGCGTC 
TACACTTTCT 
CTGCACGGTT 
GCTCGGTACG 
CATTTGCCCT 
CAAAGCAGGC 
GCCGCTTATG 
TGTTCGCGCA 
CGGCACATTC 
TTGGTTTGCA 
CGCGCCTGTT 
GCCGTTTTGG 
CGTCTGGCTG 
GCCTGAGACG 
TTCGGACTGT 
CGGCTGGCCC 
ATGTTCCTGA 
CCCTTGTGGC 
GGTTACCAAC 
CGCTGTTCCT 
CGGAGTATGG 
CGGCATCGAG 
TTTGGACGCA 
CGCTACCGCA 
GCAGACGGTC 



AAACCCACGA 

TGGCCCGGCG 

CTATACCGCC 

ATCTGTTCGG 

GCCGCCGCGT 

TGCCGCACGC 

GCGGCTTTGC 

GTCCTGATTC 

CAACCCCGCT 

ATTCTTTGGC - 

GGTTGGACGC 

GATGCTGCCC 

GTTTGATGTT 

ACCGTTTACC 

ATGGCTCGAC 

AGACGGCATT 

TTGCCTGCGC 

TTCGACCGAC 

TGCTGCTTGC 

CTTCCGCCGC 

CGGCGCGGCG 

TTGCCGTGTT 

GCCAAGCTTG 

TATCGATCCC 

TGTGGGCGAT 

TGGGCGGCAG 

GCCGTGGCTG 

AGGCATCGCT 

TGTATCGACA 

GTACGGCACA 

TCGTCCGCTT 

TGGCAGGGTG 
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1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 G AAAAT AT AT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAWG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAV NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

4 51 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



or f 14 la . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
I I I I I I I I I II M I I I M M I I I I M I I I I ! I I I I M I II I I I I II I I I I I I I I I I I I I I 
or f 1 4 1 - 1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 



orf 141a . pep LVAH L FGQ I D FG I P P V Y LWVAAA FKH L L S PWAAD P Y DAAR FAG V F FA WG LT S CG FAG FN 
I I I I I I II I I I I I I M I I I I II I I I I I II I M I I I I I I I I I I I I I I : I II I I I I I I I I 
orf 14 1-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orfl41a .pep FLGRHHGRSWLILIGCIGLIPTVHFLN PAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I I I I I I I I I I I I I I I I I I I I I : : I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I II I I 
orf 1 4 1- 1 FLGRHHGRSWLILIGCIGLIPVAHFLN PAAAAFAAAGLVLHGYSLARRR VIAASFLLGT 



orf 14 la . oep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
I I I I I I I I II I I I I I I I I M I I I I I II I I M I I I M I I I I I I I I I I M I I I I I I I I I I I I 
orfl41-l GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 la .pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLW FAL PALP LA VWT VCRTRLFSTD 
I I I I II I I I I I I I I I I I I I I I : I I I I I I I I ! I I I I I I I I I I I I I I II I I I I II I I I I I I 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFAL PALPLAVWTVCRTRLFSTD 



orf 141a .pep WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I M I I II 1 M M I 11 I I I I I I I II I I I I I 1 I I 1 I II M I I II I I I I I I I I I II I I I 1 I I 
orf 141-1 WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 



orf 14 la . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I M I I I I I I M 1 I I I I I 11 I I I I I I I I 
orf 14 1-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPI PMAVAVLFT PLWLWAITRK 



orf 141a . pep N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPVVRSME AS LSPE LKRELSDGIE 
I I I M I I I I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
orf 141-1 N I RG RQAVT NW AAG VT LT WALLMT L FL PWLDAAKS HAP WRS ME AS LSPE LKRELSDGIE 



orf 14 la . pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
II I I I I II I I I I I I I I I I I I I I I I I 1 II I i I I I I I I I i I I I ! I II I I I I I I I I II 1 I I 
or f 1 4 1 - 1 CIGIGGGDLHTRI WTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 



orf 1 41a. pep SKFALIRKTGENI 
I II II I I I MM 
or f 1 4 1 - 1 SKFALIRKIGENI 



Homology with a predicted ORF from A [gonorrhoeae 

ORF 141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N. gonorrhoeae: 

orf 141. pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 30 

II M M M 1 I M I I I 11 M I I I I MM 
orf 141ng WNPAE PAVYTAVEALAGSPTPLVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAAHPYDAA 126 
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orf 141 .pep 
orf 141ng 
orf 141 .pep 
orf 141ng 



R FAG V FFAV 1GVTSCG FAG FN FLGRHHGRXWL ILIGCIGLI P VAH FLN PAAAAFAAAG L 90 

M | | i | I I II I I M I I I I I I I M I I ! I I I MM I M M II I I M I : ! I M M M M M 

RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 

VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 14 0 
I | I | | || I M M I M M M I I M I I I II II I I I I M I I M I I M I II I II 

VLHGYSIARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 



An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 



acid sequence <SEQ ID 600>: 



10 



15 



20 



1 MPSEAVSARP LCEYLLHLAI 

51 LMAFAWLWPG VFS HDLWNPA 

101 PPVYLWVAAA FKHLLS PWAA 

151 RHHGRS WLI HIGCIGLIPV 

201 AS FLLGTGWT LMSL AAA YPA 

251 AFALPLMTV Y PLLLAKTQPA 

301 KNLLWFAPPG LPLAVWTVCR 

351 QDNLWLLPP LALFGAAQLD 

401 AMNYGWPAKL AERAAYFSPY 

4 51 GRQAVTN WAA GVTLTWALLM 

501 ELSDGIECIG IGGGDLHTRI 

551 PQGWQTVWQG ARPRNKDSKF 



RPFLLTLMLT YTPPDARPPA KTHEKPWLLL 
EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 
HPYDAAR FAG VFFAVIGLTS CGFA GFNFLG 
AHF FNPAAAA FAAAGLVLHG Y S LARRR VIA 
AFALMLPLPV LMFF RPWQSR RL MLTAVASL 
LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 
TRX.FSTDW GI LGIVWMLAVL VLLAF NPQRF 
S LRRG AAAFV NWFG IMAFGL FAVFLWTGFF 
YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 
TLFLPWLDAA KSHAPWRSM EASFSPELKR 



WTQYGTLPH RVGDVRCRYR IVRLPQNADA 
ALIRKIGENI LKTTD* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 60 1>: 



25 



30 



35 



40 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
•701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 



ATGCTGACCT 
AAAACCGTGG 
TGTTTTCCCA 
GTCGAAGCAC 
TCAAACCGAT 
TCAAACATTT 
TTTGCAGGCG 
CGGTTTCAAC 
ATATCGGCTG 
gccgccgcct 
ACGCCGGCGC 
TGATGTCGCT 
CTGCCCGTGC 
GACGGCAGTC 
CGCTGCTCtt 
TATCACGTTT 
Cagtttgttt 
TGCCGCTGGC 
TGGGGGATTT 
CTTTAATCCG 
TTGCCCTGTT 
GCTTTTGTCA 
CCTGTGGACG 
CCGAACGCGC 
ATTCCGATGG 
TACCCGGAAA 
GCGTTACCCT 
GACGCGGCGA 
TTCCCCGGAA 
TAGGCGGCGG 
TTGCCGCACC 
GCCCCAAAAC 
CGCGCCCGCG 
GAAAATATAT 



ATACCCCGCC 
CTGCTGCTGT 
CGATTTGTGG 
TGGCAGGCAG 
TTCGGCATAC 
GCTGTCGCCG 
TATTTTTTGC 
TTTTTGGGCA 
TATCGGGCTG 
tTGCCGCCGC 
GTGATtgccg 
GGCGGCAGCT 
TGATGTTTTT 
GCCTCGCTTG 
gGCAAAAACG 
TCGGTACGTt 
cactatctgA 
GGTTTGGACG 
TGGGCATTGT 
CAGCGTTTTC 
CGGCGCGGCG 
ACTGGTTCGG 
GGCTTTTTCG 
CGCCTACTTC 
CGGTTGCCGT 
AACATACGCG 
GACCTGGGCT 
AAAGCCACGC 
TTAAAACGGG 
CGACCTGCAC 
GCGTCGGCGA 
GCGGATGCGC 
CAACAAAGAC 
TAAAAACAAC 



CGATGCCCGC 
TGATGGCGTT 
AATCCTGCCG 
CCCCACCCCC 
CGCCCGTGTA 
TGGGCAGCCG 
CGTTATCGGA 
GACACCACGG 
ATTCCGGTTG 
CGGACTGGTG 
cctctTtccT 
TATCCGGCGG 
CCGTCCGTGG 
CCTTTGCCCT 
CAGCCCGCGC 
cggcgGCGTG 
AAa atctgct 
GTTTGCCGCA 
CTGGATGCTT 
AAGACAACCT 
CAACTGGACA 
CATTATGGCG 
CCATGAATTA 
AGCCCGTATT 
ACTGTTCACA 
GCAGGCAGGC 
TTGCTGATGA 
GCCCGTCGTC 
AGCTTTCAGA 
ACGCGGATTG 
TGTCCGTTGC 
CGCAAGGCTG 
AGTAAGTTTG 
AGATTGA 



CCGCCCGCCA 
TGCCTGGCTG 
AACCTGCCGT 
TTGGTTGCCC 
TCTTTGGGTT 
ACCCGTATGA 
CTGACTTCTT 
GCGCAGCGTT 
CCCATTTCCT 
CTGCacggct 
GCTCGGTACG 
CGTTTGCGCT 
CAAAGCAGGC 
GCCGCTTATG 
TGTTTGCGCA 
CGGCAcaTTC 
ttggttcgca 
CACGCCTGTT 
GCCGTTTTGG 
CGTCTGGCTG 
GCCTGAGGCG 
TTCGGGCTGT 
CGGCTGGCCC 
ACGTTCCCGA 
CCCTTGTGGC 
GGTTACCAAC 
CGCTGTTCCT 
CGGAGTATGG 
CGGCATCGAG 
TTTGGACGCA 
CGCTACCGTA 
GCAGACGGTC 
CACTGATACG 



AAACCCACGA 
TGGCCCGGCG 
CTATACCGCC 
ATCTGTTCGG 
GCCGCCGCAT 
TGCCGCACGC 
GCGGCTTTGC 
GTTTTAATCC 
CAATCCcgcc 
actcgctgGC 
GGTTGGACGT 
GATGCTGCCC 
GTTTGATGTT 
ACCGTTTACC 
ATGGCTCAAC 
AGAggGCatT 
ccgcccgggC 
TTCGACCGAC 
TGCTGCTCGC 
CTGCCGCCGC 
CGGCGCGGCG 
TTGCCGTGTT 
GCCAAGCTTG 
CATCGATCCC 
TGTGGGCGAT 
TGGGCGGCAG 
GCCGTGGCTG 
AGGCATCGTT 
TGTATCGGCA 
GTACGGCACA 
TCGTCCGCCT 
TGGCAGGGTG 
GAAAATCGGG 



This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 



60 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLIHIGCIGL 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA 

201 LPVLMFFRPW QSRR LMLTAV AS LAFALPLM TV YPLLLAKT 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT 

301 W GILGIVWML AVLVLLAF NP QRFQDNLVWL LPPLALFGAA 



NPAEPAVYTA 
WAADPYDAAR 
IP VAH FLN PA 



YPAAFALMLP 
QPALFAQWLN 
VCRTRLFSTD 
QLDSLRRGAA 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-342- 
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351 AFVNWFG IMA FGLFAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 
401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
4 51 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENILKTTD* 

ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

orfl41ng-l.pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 
I I ! I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I i t I 
orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf!41ng-l .pep LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 
I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I 
orf!41-l LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orf 141ng-l.pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I II I I I I II M I I I I I I I I I I I I I I I I I M II I I I I I I I I I M 11 I I I I I I I I I I I I 1 
orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orf!41ng-l .pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRI^LTAVASLAFALPI^TVYPLLLAKT 
I I I I I I II I I I I I I I I I I I 1 I I I I M I I I I I I I I M I I I II I I I M I II M I II I I I I I I 
orf 141-1 GWTLMSIJ^AAYPAAFALMLPLPVLMFFRPWQSRRI^LTAVASLAFAiPLMTVYPLLLAKT 

orf 141ng-l .pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
I I M I M I I : I I I I I (I I I I I I : I I I I M : I ! I I I I I I I I : I II II I I I I I I I II I I I 
or f 1 4 1 - 1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 141ng-l - pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
I I I II : I I I I M I I I M I I I I I I I I I II I I I I M I ! I I I I I I I I I II II I I I I I I I I I I 
orf 14 1-1 WGILGVWMIAVLVLI^VNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 14ing-l .pep FGLFAVFLWTGFFAMNYGWPT^KLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 
I I I I I 1 I I I I I I I I I II M I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVL FT PLWLWAITRK 

orf 141ng-l .pep N I RG R QA VTN W AAG VT LT W A LLMT LFLPWLDAAKS HAP WRSME AS FSPE LKRELSDGIE 
I I I I I I I II I I I I I I I I I I I I I I ! I M I I I M I I I I I 1 I I I I II I I : I I 1 I I I I II I I I I 
orf 141-1 N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPE LKRELSDGIE 

orf 141ng-l .pep CIGIGGGDLHTRI WTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
I I I I II II I I II I I I II I I I I I I 11 I I I : M I i I I I I M II I I I I I I I I I I II I I I I I I 
orf 14 1-1 CIGIGGGDLHTRI WTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 141nc-i .pep SKFALIRKIGENILKTTDX 

I I I 1 I II I I II II 
orf 141-1 SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 



The following partial DNA sequence was identified in N. meningitidis <SEQ ED 603>: 

1 ..CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 



51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 
101 CCGGCCGCGC ATT G AAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 . . OSAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 



Further work revealed the complete nucleotide sequence <SEQ ID 605>: 



WO 99/24578 



-343- 
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10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGGATAATT 
TTTCTCTGCC 
ATGGACGTTC 
CGCAAAGAAG 
CGGTAAATGG 
CAGTTTCCGG 
ACTGATTTCG 
CTATCTCGGT 
ATGCCGAACT 
CTTTCCCACA 
ATATAAACGC 
CCTTTGGCGA 
GTAAATACTC 
CGTTCATGCA 
CTATCGGCGG 
TCTGCCGAGC 
ACCAGGCCAT 
AATCCGCCAA 
ATACGCGGGC 
CGGCCGCGCA 
GCGGTTTTCA 



CGGGTAGTGA 
GACAATCCTT 
GATTGGCGGT 
GCGGATCAAA 
ACATGGGCAT 
ATTATCGGAA 
GCTTCAACCG 
GTAAAACTGT 
GACTGTACAA 
AAGAATATAT 
GGCACCGGCA 
AGGCACGTCA 
CTTTTCAAAT 
CAATGGAACA 
ACACCACACC 
GGGGATGGTA 
CAGCTTTATC 
ATGGTTATCG 
AGATAAAGCT 
TTGAAAAAGC 
GGTAGGCTAT 



GGCGACAGGA 
TGGGACTGAG 
ACGCCCGATG 
CAATTACGCC 
TCAATCACAA 
GTCTATGACT 
CCTGTTGTAT 
GGATGAGGGA 
CGGCGTAAAA 
CGGTCGCAGT 
TGAAAGATGC 
CGTATGAAAA 
CGGTAAACAG 
AAACCCCGCT 
GTACGTGGCT 
TTGGCGCAAC 
TTGGGGCTGA 
GGCCAAACTC 
TGGCGGCAAC 
CCGAATTTTT 
ACGTTTTAA 



AAATACCAAG 
TGATATGTTC 
AGGAAAGTTT 
GTACATTATT 
TGGCTACCGT 
ATAATGGAAA 
CGTGATGCCA 
AACAAAAAGT 
CTGCGGGTTG 
ACGGCAGATT 
TCTGCGCGCG 
TTTGGACGGC 
CTATTTGCCT 
AACATCGCAA 
TCGACGGTGA 
GATTTGAGCT 
TGTAGGACAT 
TAGTCGGCAC 
CTGCATTACG 
CCAATCAAGG 



GAAATATCAC 
TATGTAAATT 
TGACGGCCAT 
CAGCCCCTTT 
TACCATCAGG 
AAGTTACAAT 
AACGCAAAAC 
TACATTGATG 
GTTGGCAGAA 
TTAAGTTGAA 
CCTGAAGAAG 
ATCGGCTGAT 
ATGACACATC 
GACAAACTGG 
AATGAGTTTG 
GGCAATTTAA 
GTTTCAGGAC 
AGCAATTGGG 
AT AT AT TT AC 
AAATGGGCAA 



This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



25 



l 

51 
101 
151 
201 
251 
301 



MDNSGSEATG 
RKEGGSNNYA 
TDFGFNRLLY 
LSHKEYIGRS 
VNTPFQIGKQ 
SAERGWYWRN 
IRGQIKLGGN 



KYQGNITFSA 
VHYSAPFGKW 
RDAKRKTYLG 
TADFKLKYKR 
LFAYDTSVHA 
DLSWQFKPGH 
LHYDIFTGRA 



DNPLGLSDMF 
TWAFNHNGYR 
VKLWMRETKS 
GTGMKDALRA 
QWNKTPLTSQ 
QLYLGADVGH 
LKKPEFFQSR 



YVNYGRSIGG 
YHQAVSGLSE 
YIDDAELTVQ 
PEEAFGEGTS 
DKLAIGGHHT 
VSGQSAKWLS 
KWASGFQVGY 



TPDEESFDGH 
VYDYNGKSYN 
RRKTAGWLAE 
RMKIWTASAD 
VRGFDGEMSL 
GQTLVGTAIG 
TF* 



30 Computer analysis of this amino acid sequence gave the following results: 
Horaoloev with a predicted ORF from N .gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N. gonorrhoeae: 



35 



40 



orf 142 .pep 
orf 142ng 
orf 142 .pep 



QSAKWLSGQT LVGTAIGIRGQIKLGGNLHY 30 
I I I I M I I I I I : I I I I I I I I I I I I I I M 1 I 
RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 
I I I I 1 I I I I ! I I : M : : I I : : I I I I It : I 
DIFTGRALKKPEYFQTKKWVTGFQVGYSF 34 2 



orf 142ng 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 



ATGGATAATT 
TTTCTCTGCC 
ATGGACGTTC 
CGCAAAGAAG 
CGGTAAATGG 
CGGTTTCCGG 
ACTGATTTCG 
CTATCTCAGT 
ATGCCGAACT 
CTTTCCCACA 
ATATAAACAC 
CCTTTGGCGA 
GTAAATACTC 
CGTTCATGCA 
CTATCGGCGG 
CCTGCCGAGC 
ACCAGGCCAT 
AATCCGCCAA 
ATACGCGGGC 
CGGCCGTGCA 



CGGGTAGTGA 
GACAATCCTT 
AATTGGCGGT 
GCGGATCAAA 
ACATGGGCAT 
ATTATCGGAA 
GCTTCAACCG 
GTAAAACTGT 
GACTGTACAA 
AAGGATATAT 
GGCACCGGCA 
AGGCACGTCA 
CTTTTCAAAT 
CAATGGAACA 
ACACCACACC 
GGGGATGGTA 
CAGCTTTATC 
ATGGTTATCG 
AGATAAAGCT 
TTGAAAAAGC 



GGCGACAGGA 
TTGGACTGAG 
ACGCCCGATG 
CAATTACGCC 
TCAATCACAA 
GTCTATGACT 
CCTGTTGTAT 
GGACGAGGGA 
CGGCGTAAAA 
CGGTCGCAGT 
TGAAAGATGC 
CGTATGAAAA 
CGGTAAACAG 
AAACCCCGCT 
GTACGTGGCT 
TTGGCGCAAC 
TTGGGGCTGA 
GGCCAAACTC 
TGGCGGCAAC 
CCGAATATTT 



AAATACCAAG 
TGATATGTTC 
AGGAAAATTT 
GTACATTATT 
TGGCTACCGT 
ATAATGGAAA 
CGTGATGCCA 
AACAAAAAGT 
CCACAGGTTG 
ACGGCAGATT 
TCTGCGCGCG 
TTTGGACGGC 
CTATTTGCCT 
AACATCGCAA 
TCGACGGTGA 
GATTTGAGCT 
TGTAGGACAT 
TAGCCGGCAC 
CTGCATTACG 
TCAGACGAAG 



GAAATATCAC 
TATGTAAATT 
TGACGGCCAT 
CAGCCCCTTT 
TACCATCAGG 
AAGTTACAAC 
AACGCAAAAC 
TACATTGATG 
GTTGGCAGAA 
TTAAGTTGAA 
CCTGAAGAAG 
ATCGGCTGAT 
ATGACACATC 
GACAAACTGG 
AATGAGTTTG 
GGCAATTTAA 
GTTTCAGGAC 
AGCAATTGGG 
ATATATTTAC 
AAATGGGTAA. 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 



ORF142ng and ORF 142-1 show 95.6% identity over 342aa overlap: 

orf 14 2-1 .pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
M I I I M I I i I I I I I I I I I i I M : I I I I t I M M M I 1 I I 1 I t I I : I 1 I t t It I I I I I I I 
1 5 orf 1 42ng- 1 MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 



20 



orf 14 2-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
I I I I t I I I I I I I I I I M ! I I I I I I I I I I I I I I I II I I I I I I I 1 I I M I I II I! I I I I I I : 
orfl4 2ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1 . pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
Ml! I I I I I 1 I I I I I I I I I I I I I : II M I I I I I I I I I I I I II I I M I : I I M I I I I I I 
orfl4 2ng-l VKLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 



25 orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

I I I I I I II M I I I I I I !l I I I I I I I I I II I I I I I I I M I I I M I I I M I I I I I I M I II I 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

orf 14 2-1 .pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
30 " I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I M M I I M I I II I M I : I I I I I 

orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 .pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
I I I I I I I i I I I I I I I I 11 I I I I I i I : I I :: I I :: I M I I I : I 
35 orf 142ng-l I RGQ I KLGGN LH Y D I FTGRALKKPE Y FQTKKWVTG FQVG Y S F 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

gi 11772622 (L39897) HecB (Erwinia chrysanthemi ] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities = 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 
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Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS SRFATSHDAESLQAG 280 



45 Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+ S P+G W RY + G S F +R+++RD KT ++ 

Sbjct : 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 
50 * R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 
+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

55 Sbjct: 4 00 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 4 56 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP — ■ — GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNELNWQAWQLPVLGNVTFMAAVDGGHLYNHKQDNSTAASLWG 515 



60 



Query: 297 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L + G + P + Q V G++VG SF 
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Sbjct: 516 GAVGMTVASRW L S QQVT VG W P I S Y P AWLQ P DTMWG YRVGL S F 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

5 Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCTTACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

10 151 ACGAGCAATT GCCGTTGCTG AT GGAACAAT TGTCCGGCAG CGGTAAGGCG 

2 01 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

15 1 MRTKWSAVRS CTJVADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN .. 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

20 51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

2 01 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

25 301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

4 01 TGTTGGCGGC AGAAGTCGCA C AG AT GG AAA AG AAAT AC C G GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

30 551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGC ATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAOTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

35 101 EQLSGSGKAL LVDRNGLYLA NAN FHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of N. 
meningitidis: 

10 ' 20 30 

or-!143 pep MRT KW S AVRS CTW ADT AD I DT ALN LL YRLQKLE FL 

I : : ill I II I I I I I I II II I II I I I I 
45 orfl4 3a GAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTADIDTALNLLYRLQKLEFL 

• 20 30 40 50 60 70 

40 50 60 ' "70 80 90 

orf 143 pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANAN FHHEAAEELGLLAAE 

50 * 1 1 ii i M 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 
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orfl43a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
80 90 100 110 120 130 

100 110 
orf 14 3 . pep VAQMEKKYRLLIKNN 
I I I I I I I I I I MM 

or f 1 4 3a VAOMEKKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG I PDLGKEA 

140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGG CAT C AAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 
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This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

30 ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 
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orfl4 3a .pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 
MM M M I M M M M M M M 1 I M M M M M M M M M M M M M M I M 
orf 14 3-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

orf 14 3a. pep DIDTALNLLYRLQKLE FLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 
M M I M M M M M M I M M M M I M M M M M M M M M M M M M M M M I 
orfl4 3-l DIDTALNLLYRLQKLE FLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 14 3a. pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
M I I M M I M M M M M M I M M M M M I M M M M M II M M M M M M M 
orf 14 3-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

orf 14 3a. pep STKFI LVIGGI PDLGKEAFVTLVRXLY 
M M I M ! M M M I M i I M M I M 
orf!43-l STKFILVIGGI PDLGKE AFVT L VRI L Y 



Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a llOaa overlap with a predicted ORF (ORF143ng) from 



50 
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TV. gonorrhoeae : 

orf 143 . pep 
orfl43ng 
orf 143 .pep 
orf 143ng 

An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 
acid sequence <SEQ ID 616>: 



MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 
t M M M M I \: MIIMIIIIIMIIIIIIIMMIIIMI Mill M M M M M I 

MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 
M M M M M I M I M M M M M : M M M M 1) M M M M M I I : M 

SGSGKALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 



BNSDOCID: <WO 9924578A2J_ 
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1 MRTKWSAVRS 

51 EQLPLLMEQL 

101 KKYRLLIRNN 

151 LSKGGICYFG 

201 SAVISTDGLP 

251 MIKGKSGYIL 



CSRADTADID 
SGSGKALLVD 
LYINNNAWGV 
KDFIPPLQQP 
MATMLPSHLN 
LSQAGKDAVL 



TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

RNGLYLANAN FHHESAEELG LLAAEVAQME 

CDPSGQSELT F FPLYIGSTK FILVIAGI PD 

RVKLGTGGIM RQLLISILED LNNTSTDIIA 

SDR VGA I SAT LLALGSRSVQ ELACGELEQV 

VLVAKETGRL GLILLDAKRA ARHIAEAI * 



Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 
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15 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCG 
GAGAAGCTGC 
GTTGTACCGT 
ATTCAGACGG 
CAATTGTCCG 
TCTTGCCAAC 
TGGCGGCAGA 
AACAACCTGT 
TCAGAGCGAA 
TTTTGGTTAT 
TTGGTAAGGA 



CACTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TCGCGGCGGA 
TTGCAAAAAC 
CATCAATTTG 
GCAGCGGTAA 
GCCAATTTCC 
AGTCGCACAG 
ATATCAACAA 
TTGACATTTT 
CGCCGGCATT 
TTTTATACCG 



ACAAGCGAAT 
TAT C C AG C G A 
AAAGCGGATG 
CACCGCCGAC 
TCGAATTCCT 
TCGGACGAGC 
GGCATTATTG 
ATCATGAGTC 
ATGGAAAAGA 
TAACGCTTGG 
TCCCATTGTA 
CCCGATTTGA 
CCGTTACAGC 



TTATATCCCT 
TGCCCCCAGT 
CGGACGAAGT 
ATCGATACCG 
CTATGGCGAT 
AATTGCCGTT 
GTCGATCGGA 
GGCGGAAGAG 
AATACCGGCT 
GGCGTTTGCG 
TATCGGTTCA 
GCAAAGAGGC 
AACCGCGTGT 



GCCTGACTCC 
GCCGGTAAAA 
GGTCAGCAGT 
CTTTGAACCT 
GAAAACGGTC 
GCTGATGGAA 
ACGGTCTGTA 
TTGGGGTTGT 
GCTGATTAGG 
ATCCTTCCGG 
ACCAAATTTA 
ATTTGTTACT 
AA 



This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 
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i 

51 
101 
151 
201 



MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 
EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 
QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 
NNLYINNNAW GVCDPSGQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 
LVRILYRRYS NRV* 
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ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 

orfl43na-l .pep ME ST LS LQAN LYPCLT PAGAFYAVS S DAPSAGKTLLRS LLKADADEVVS SEKLLA- ADTA 59 

1 I 1 I I i i I I I I I I I I I M II I I I I I II I I I I I i II : M M II I I I : I 11 I I I I : I I I I. 
orf 14 3-1 ME STLS LQAN LY PRLT PAGAFYAVS S DAPSAGKTLLHSL.LKADADEMVS SEKLLTWADTA 60 

orfl43na-l .pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

I I I I I I I I I M I I I I I I I I I II I M I I I M 11 I II II II I 1 I I II M M II I II I I I I I I 
orfl4 3-l DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 120 

orfl43nc-l . peo NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 179 

I I I I I I I : 1 I 1 II I I I I M I I I I I I I I I I I : I I I I I I I I M I I I I I I II I I I I I I I I I I I 
orf 14 3-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orfl43ng-l .pep STKFI LVI AG I PDLSKE AFVTLVRI LYRRYSNRV 213 

I I | M I I I : I I II I : I I I I II I I II II I I I I I M 
orf 14 3-1 STKFI LVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 61 9>: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTGTT 
ACGGCAATCG 
GACGATAGAC 
CCGTGGATG . 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACGCTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCTGA 
GaCGGGTCAA 



AAATCTGTGC 
GTACCGCAGr 
CCCCGTGCTG 
ACCGCTGGTC 
CA. GGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
wTyCCAGCGT 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKI CAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM... 

5 Further work revealed the complete nucleotide sequence <SEQ ID 621 >: 
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l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTGTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAC 
CCAAACCGCT 
AGCGTTTTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGACACCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACGGCAGTA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACGCTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATG 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCAGGCG 
CGCGCTCCCT 
ATTTACGGCG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CAAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCATGTGAA 
TTGAACATGA 
G 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCTGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTCACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGCTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTTCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGGCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 



35 



40 



45 



50 



55 



60 



This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKI CAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 
51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 
101 TAIGSVMLVV TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWA LLT FGP 
151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLT FMT LLLWGLYRFV 
201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFA AVPF 
251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 
351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAE FDA 
4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 1 4 4 .'pep MTFLQRLQGLADNKICAFA WFWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
I I I I I I I I I! it I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | 
O r f 1 4 4 a MTFLQRLQGLADNKI CAFA W FWRRFDEERV PQAAASMT FTT LLALVPVLTVMVAVAS I F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 14 4 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANR LTAIGSVMLWTSLML IRTID 
I I I I I I I I I I I I I 1 1 I I I I i I II I I I I I I I I II I I I I I 1 I ! I I I I I I I I I I I I I I If I 
orf 14 4a PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANR LTAIGSVMLWTSXML IRTID 

70 80 90 100 110 120 

130 

orf 144. pep NTFNRIWRVXXQRPWM 
! I I I ! ! I I I I i I I 1 

orf 14 4a ' NTFNRIWRVNSQRPWMMQFLVYW ALLTFGPLSLGVGI S FXV GSVQDAALASGAPOWSGAL 
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130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence <SEQ ED 623> is: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

351 G AC GAT AG AC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

4 01 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

7 51 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 G CAT AT C AAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG AT CAT G T G AA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACAGCAGCA ATCTTGA 

This encodes a protein having amino acid sequence <SEQ ED 624>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

4 01 QAKKQQQS * 

ORF144a and ORF 144-1 show 97.8% identity in 406 aa overlap: 

orf 14 4a .pep MT FLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVAS I F 

I I I I M II I M I I I I I I I I I I II I I II I M I I I I I I I I I I M I M II i I I I II I I I I I I I 
orf 14 4-1 - MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orfl4 4a.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 

II II I (I M I I I II I I I I I I I I I I I I I M I I I I I I I I I I I I I I If I I II I 1 I II I I III 
orf!4 4-l PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orf 14 4a. pep NTFNRIWRVNSQRPWMMQFLVYWALLT FGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

I ! I I I I I I I I II I I II I II I I I I II I I I 1 I II I I I I I II I II I I I I II M I I I I II I M 
orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 14 4a . pep RTAATLX FMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTW YMGNFDGYRS 
I I I II I : I I I I I I I II M II I I I I I II I I I I II I I I I I I I M II I I I I II I I I I I I II 
orf 14 4-1 RTAATLT FMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 14 4a. pep I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 

I I II II I I II II I I I I I I I I I II I I I II I I I II 1 I I I I II I I I I I I I I I I I I I I I I I I I 
orf 14 4-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 14 4a. pep DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYI YSGRQGWVLKTGADSIELNEL 

I I M M I I I I I II I I I I I I I I I I I I I I I I I I II I II I I I II I I I I I I I I I I I I I I I I M 
orf 14 4-1 DAAQKEGKALPVQE FRRHINMGYDELGELLEKLARHGY I YSGRQGWVLKTGADSIELNEL 

orf 14 4a. pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 4 08 

II II I II II II I I M I I ! I I I I 1 I II I I I I I I I I I I I I I M I I : I 

orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 406 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF144 shows 9L2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N. gonorrhoeae: 

5 orf 1 4 4 . pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 60 

INN II I I I I I I I I I 1 I I : 1 I I : I I I I I I I I I I I I M I I II I I I I II I I I I II I I 
orf 14 4ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 60 

orf 14 4 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 120 
10 I II M II I | I | | I | | | | | M I I I I II I I I : II I : II I I I I I I I I I I I I I I I I I 1 I II I 1 

orf 14 4ng PVFDRWS DS FVS FVNQT I VPQGADMVFD Y I DAFRDQANRLT AI GS VMLVVT S LML I RT I D 120 

orfl44.pep NTFNRIWRVXXQRPWM 136 
I : I I I I I I t : II I I I 

15 orf 14 4ng N AFNRI WRVNTQRPWMMQFLVYWALLTFGPLS LGVG I S FMVG S VQDS VLS SG AQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 

1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

20 101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LS LGVG IS FN V GSVQDSVLS SG AQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFL FTW YMGNFDGYRS I YGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

25 3 51 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ED 627>: 

1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG • 

30 101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

2 01 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

2 51 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

35 351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

4 01 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

4 51 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

40 601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

7 51 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

45 851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GC AT AT CAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

50 1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcqccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAG C AG C A GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>: 

1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQAAASMTF TTLLALVPVL 

55 51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SG AQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS I YGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

60 301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS * 

OKF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

orf 144IMJ-1 . Pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

5 orri^ng p f , | , | M I ! I I I I I I I : I I I : I I I I I II I I I I I M II I I II II Ml) 

orf 14 4-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orfl4 4na-l pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 
M | | 1 I I I t I M M I I I I M I t I I I IN I I : I M : I I I I I I I I I I M I M I I I I I I I j I I 
\Q orf 14 4-1 pvFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orfl4 4nq-^ pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 
| : | | M I I i I : I I I t I I I I I M I I I I ! I II I I I I I I I I I I I I I I I I :: I : I I I II: 11 
orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 



15 



or^l4 4nq-l pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 
: | | | | : | 1 | | I I I I I I I I I I I M I I I II I II i I I I I I I I I I I 1 I I M I I I I I I I I I I 
or^l4 4-l RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



20 or^4 4na-l pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

M I I I II I I I I I I I M I I I 11 I I I I I I I I I I I I I I I I I I I I I M I I M M I I I I I I I M I 
orf 14 4-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

o^l4 4nq-l pep DAAQKEGRTLSVQE FRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
?5 " | | | | | | | : : | I I I M I i I 1 I I M M I M I I I I 1 I : I I I I I I I M I I I I M I 1 M I I: I I 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

O^-f *» 4 4ng-l . pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 
t | | M I I I I I I I M I I I M I I II I I I I II I I I I I I I I I I I M I I : I 
30 orf 1 4 4 - 1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

35 Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 

1 . .AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

40 151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ED 630; ORF146>: 

1 . . RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 

45 Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACT CCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

50 201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

55 4 51 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 
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501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

5 701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

10 951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GC AC CAAT AT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

15 1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

20 251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 
25 ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A oiN. 
meningitidis: 

10 20 30 

orf 14 6. pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

II I I I I I I II I II I I I I M I I I I I I I M I I 
30 orf 14 6a KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 

280 290 300 310 320 330 

40 50 60 70 

orf 14 6. pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 
35 I I M I : I I I II II I I I M I I I I II I II I I I I I I I I I M I II I M : 

orf 14 6a LWLSTNMRQE I SALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 

340 350 360 370 

The complete length ORF 146a nucleotide sequence <SEQ ID 633> is: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

40 51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

45 301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

401 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

4 51 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

50 551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

7 51 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

55 801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

60 1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 

51 
101 
151 
201 
251 
301 
351 



MNTSQRNRLV 
EWIGMTVFW 



SRWLNSYERY 
LGMLQFQGAI 



GNLLFYLTVG 
LMRAMNVLIG 



TASALAGWAA 
AAIAIAAAKL 



RRMTRERLEE 
RKIVNTTELL 
RHARRIRIDT 
TRRKWLDAHE 



NMAKMRQ I N A 
LTTAAKLQSP 
AINPELEALA 
RQHLRQSLLE 



RYRRLIHAVR 
YSKAVER MLG 
VGKNGYVPML 
LPLKSTLMWR 
RMVKSRSHLA 
KLNGSEIRLL 
EHLHYQWQGF 
TREHS* 



LGGAVLFATA 
TVIGLGAGLG 



SARLLHLQHG 
VLWLNQHYFH 



AGLTMCMLIG 
FMLADNLTDC 
ATSGESRISP 
DRHFTLLQTD 
LWLSTNMRQE 



DNGSEWFDSG 
SKMIAEISNG 
AMMEAMQHAH 
LQQTVALING 
ISALVILLQR 



10 ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



15 



20 



25 



30 



35 



40 



orf 14 6a .pep 
orf 146-1 
orf 146a . pep 
orfl46-l 
orf 146a . pep 
orf 146-1 
orf 146a . pep 
orfl46-l 
orf 14 6a . pep 
orf 146-1 
orf 14 6a . pep 
orfl46-l 
orf 14 6a . pep 
orf!46-l 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
I I I I I I M I I I I I II I I I I I I I I II I I I I I I I I I II I M I I I I i N I II I I I I I M I I I I 
MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I I I I II I II I I I I I I I I I I I I I I I I 11 II I M I I I I I I I I I I I I II I I I I I I I M I II I I 
LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I II I I I M I I I I I I I I I I I I I I I I II : I I I M I II I II I I I I I I I I I I I II M I I I I I I I 

vgkngyvpmlagltmcmligdngsewldsglmramnvligaaiaiaaakllplkstlmwr 
™ladnltdcskmiaeisngrrmtrerleenmakmrqinarmvksrshlaatsgesrisp 

I I I I I I I : i 1 I I I I I II I I I I II I I I I ! I I I I I M I I II I I I I! I ! II I I I I I I II I M I 
FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
I I I I I I M I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I 
AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M II I I I 1 I M I I 
RHARRI R I DTAI N PE LEALAEHLH YQWQG FLWLSTNMRQE I SALVI LLQRTRRKWLDAHE 

RQHLRQSLLETREHSX 
I I II I I M I II I I I : 
RQHLRQSLLETREHGX 



45 



50 



55 



Homology with a predicted ORF from N. gonorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 

orf 146 .pep 
orf 14 6ng 
orf 14 6. pep 
orf 14 6ng 

An ORF146ng nucleotide sequence <SEQ ID 63 5> was predicted to encode a protein having amino 
acid sequence <SEQ ED 63 6>: 



RHARRI RIDTA IN PELEALAEHLHYQWQGF 30 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KLNGSEIRLLDRH FT LLQTDLQQTAALINGRHARRI RIDTA IN PELEALAEHLHYQWQGF 364 

LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 7 5 
I I I I I : I I II i I ! I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
LWLSTNMRQE I SALVI PLQRTRRKWLDAHERQHLRQS LLETREHG 4 09 



1 


MSGVRFPSPA 


51 


YERYRHRRLI 


101 


QGAIYSNAVE 


151 


GWAAVGKNGY 


201 


AAKLLPLKST 


251 


QINARMVKSR 


301 


LQSPKLNGSE 


351 


EALAEHLHYQ 


401 


SLLETREHG* 



PIPSTDPPSG 
HAVRLGGTVL 
RMLGTVIGLG 



SLCFFTFPLQ 
FATALARLLH 
AGLGVLWLNQ 



TASDMMSSQR 
LQHGEWIGMT 



KRLSGRWLNS 
VFWLGMLQF 



VPMLAGLTMC 
LMWRFMLADN 
SHLAATSGES 
IRLLDRHFTL 
WQGFLWLSTN 



MLIGDNGSEW 
LA DCS KM I AE 
RISPSMMEAM 
LQTDLQQTAA 
MRQEISALVI 



HYFHGNLLFY 
LDSGLMRAMN 
ISNGRRMTRE 
QHAHRKIVNT 
LINGRHARRI 
PLQRTRRKWL 



LTIGTASALA 
VLIGAAIAIA 
RLEQNMVKMR 
TELLLTTAAK 
RIDTAINPEL 
DAHERQHLRQ 



60 Further work revealed the following gonococcal DNA sequence <SEQ ID 63 7>: 
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1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGAACTCCT 
CGAACGCTac 
ccgtCCTGTT 
gAATGGATAG 
AGGCgcgatt 
ggctgGGCGC 
ggcaacCTcc 
ctGGGCGGCG 
CGATGTGCAT 
CTGATGCGCG 
CGCCAAACTG 
CCGACAACCT 
AGGCGTATGA 
AATCAACGCA 
GCGAAAGCCG 
CGCAAAATCG 
GCAATCTCCC 
TCACACTGCT 
AGACACGCCC 
AGCCCTCGCC 
GCACCAATAT 
ACCCGCCGCA 
CCTGCTTGAA 



CGCAACGCAA 
cGCCaccGCC 
CGCCACCGCA 
GGAtgaCCGT 
tActccaacg 
GGGTTTGGgc 
tcttctacct 
GTCGGCAAAA 
gctcatcggc 
CGATGAACGT 
CTGCCGCTGA 
GGCCGACTGC 
CGCGCGAACG 
CGCATGGTCA 
CATCAGCCCC 
TCAACACCAC 
AAACTCAACG 
CCAAACCGAC 
GCCGCATCCG 
GAACACCTCC 
GCGTCAGGAA 
AATGGCTGGA 
ACACGGGAAC 



ACGCCTTTCC 
GCCTCATACA 
CTCGCCCGgc 
CTTCGTCGTC 
cggtgGAacg 
gTTTTATGGC 
gaccatcggc 
acggctacgt 
gACAACGGCA 
CCTCATCGGC 
AAT C C AC ACT 
AGCAAAATGA 
TTTGGAGCAG 
AAAGCCGCAG 
TCCATGATGG 
CGAGCTGCTC 
GCAGCGAAAT 
CTGCAACAAA 
CATCGACACC 
ACT AC C AAT G 
ATTTCCGCCC 
TGCCCACGAA 
ACGGCTGA 



GgccGCTGGC 
TGCCGTGCGG 
tACTCCACCT 
CTCGGCATGC 
taTGctcggt 
TGAACCAGCA 
acggcaagcg 
ccctatgctg 
GCGAATGGCT 
GCCGCCATCG 
GATGTGGCGT 
TTGCCGAAAT 
AATATGGTCA 
CCACCTCGCC 
AAGCCATGCA 
CTGACCACCG 
CCGGCTGCTC 
CCGCCGCCCT 
GCCATCAACC 
GCAGGGCTTC 
TCGTCATCCT 
CGCCAACACC 



TCAACTCCTA 
CTCGGCggaa 
CCAacacggc 
TCCAGTTCCA 
acggtcatcg 
TTAtttccac 
cactggccgg 
GCGGGGctgA 
CGACAGCGGC 
CCATTGCCGC 
TTCATGCTTG 
CAGCAACGGC 
AAATGCGCCA 
GCCACATCGG 
GCACGCCCAC 
CCGCCAAGCT 
GACCGCCACT 
CATCAACGGC 
CCGAACTGGA 
CTCTGGCTCA 
GCTGCAACGC 
TGCGCCAAAG 



This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSNAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 



orf 14 6-1 - pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 
I I : I II : I I : I I I I I I I I I I : I I I I I M I I I I : I I I I I I I I M ! I I I I I I I I I II I I I 
orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 

orf 14 6-1 . pep LGMLQFQGAI YSKAVERMLGTV I GLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I I II It I I I I I I : I I I I I I I I M I I II I I I 1 I I I I I I I I I I I I I I I I I : I I II I I M I I I 
orf 14 6ng-l LGMLQFQGAI YSNAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLT I GTASALAGWAA 

orf 14 6-1 .pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I I I I I II I I I I I I I I I I II I I I I I I I I I II 1 II I I I I 1 I I I II I M I I H I I II I I I I I I 
orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6-1 .pep . FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I M II II I I I I I I I I I I M I I I I I 1 I I I : I I : I I I I M I I I I I! I II I I I I I I I I II I I 
orf!4 6ng-l FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

orfl4 6-l .pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
: I I I I I I I i I I I I I I I M I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I M I I : I It I I 
orfl4 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 146-1 .pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I II I I I I I I I I I I I I t I I I I i I 1 I I II M I I I I I I I I I I I I I I I I I I I I I I M I I M I I 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6-1. pep RQHLRQSLLETREHGX 
I I I I I I I I I I I I I I I I 
orfl46ng-l RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 



sp I P33011 | YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi|1736674 i gnl I PID I dlOl 6553 (D90838) ORF_ID: o34 8#20 ; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi 1 1736682 | gnl I PID I dlOl 6560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli) 



10 



WO 99/24578 PCT/IB98/01665 

-355- 

>gi 1 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-terminal residues [Escherichia colij Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities - 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 

YRHRRLI HAVRLGGT VL FAT ALARLLHLQHGEW I GMTVFWLGMLQFQGAI Y SN AVERML 7 9 
YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 



GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 

GTVLGSILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAI W 131 



15 - - G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ 



20 



25 



Query : 


20 


Sbjct : 


15 


Query: 


80 


Sbjct : 


75 


Query: 


140 


Sbjct: 


132 


Query: 


200 


Sbjct: 


191 


Query: 


260 


Sbjct: 


248 


Query: 


317 


Sbjct : 


306 



+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 



LN ++R D AL G +N 



EALAEHL--HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 



On the basis of this analysis, including the identification of several transmembrane domains in the 
30 gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae , and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 76 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 

1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

35 51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

40 301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

4 01 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

4 51 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

45 551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

7 01 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 

50 1 . . AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD . . 

55 Further work revealed the complete nucleotide sequence <SEQ ID 641>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

5 401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

10 651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CAC AG CCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

15 This corresponds to the amino acid sequence <SEQ ED 642; ORF147-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

20 201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein ORF286 of E.coli (accession number U18997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 

25 Orfl47: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 
30 L R RE F + GF+P KS RR 



35 



40 



Orfl47 : 


1 


Orf286: 


43 


0rfl47: 


61 


Orf286: 


103 


Orfl47: 


121 


Orf286: 


163 


Orfl47 : 


180 


Orf286: 


223 



++ +E+ HR+ +L D+ + E R + + LARE+TKT+ET VGE+ + D + 



+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 27 8 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of TV. meningitidis: 

10 20 30 

orf 14 7 . pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
45 I I M I I I ! I I I I I It I I I I I I I I I ! I I I I i 

orf75a TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
20 30 40 50 60 70 

40 50 60 70 80 90 

50 orf 147 . pen MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKL.ARRVREAGFK WPWGAXAVMAALSVA 

M I II I I I I M I I II I I I I I I I I I I I I I I I I I M I I I I I : I I I I I I I I I I I I I II I It I 
orf 7 5a MADKIVGYLSDGMWAQVSDAGTPAVCDPGAK1ARRVREVGFK VVPVVGASAVMAALSVA 
80 ' 90 100 110 120 ■ 130 

55 100 110 120 130 140 150 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 
II I M M M I I I I II I I 1 II I I I M I I I : M I :! M II I I I I II : I M I II I I I I I M I 
orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 - 150 160 170 180 190 



60 



160 170 180 190 200 210 

orf 14 7 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 



BNSDOCID: <WO 9924578A2_t_> 



WO 99/24578 



orf75a 



orf 147 .pep 
orf 7 5a 
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| | | | I | I M I I I I i I 1 I i I M I I I : I I I : I I I I I I I I I I M It I I I I II I M M I I I I I I 
LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
200 210 220 230 240 250 

220 230 
LTAELPTKQAAELAAKITGEGKKALYD 

M I I II II I I M I I I I I I I I I M M I I 
LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 



10 ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N.zonorrhoeae 

ORF 147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 



gonorrhoeae: 



15 



20 



25 



30 



orf 147 .pep 
orf 147ng 
orf 147 . pep 
orf 147ng 
orf 147 .pep 
orf 147ng 
orf 147 . pep 
orf 147ng 
orf 147 . pep 
orfl47ng 



AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 

I M I II I I II I I I I I I II : I I M I t I I I I I 
TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 



30 



85 



90 



MADKIVGYLSDG^^VVAQVSDAGTPAVCDPGAKLARRVREAGFKVVPVVGAXAVMAALSVA 
| I I I :: I : I I I I : M I I 1 I I I I M I I I I I I I I I I I I I I I II I I I I I I I II MINIMI 
MADPCVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 14 5 

GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 
II M I I 1 I M i M I I I II 1 I I II I 1 II I I M : M I I I M I II I : I i I I I I I I I I M M 
GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

I II I II M II I M I I ! I I I M II I : I I I M I I M I I I I M I I M I It II I I M I M Mi 
LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 2 65 

LTAELPTKQAAELAAKITGEGKKALYD 237 
I : I II I I I I M I I! I I I II I I M I M I 

LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 



An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 



acid sequence <SEQ ID 644>: 



35 



40 



1 MSVFQTAFFM 

51 ADIICAEDTR 

101 AQVSDAGTPA 

151 DFYFNGFVPP 

201 ERRLMLAREI 

251 KHEGLSESAQ 

301 * 



FQKHLQKASD 
VTAQLLSAYG 
VCDPGAKLAR 
KSGERRKLFA 
TKTFETFLSG 
NAMKILAAEL 



SWGGTLYVV 
IQGRLVSVRE 
RVREAGFKVV 
KWVRAAFPW 
TVGEIQTALA 
PTKQAAELAA 



ATPIGNLADI 
HNERQMADKV 
PWGASAVMA 



TLRALAVLQK 
IGFLSDGLW 
ALSVAGVAES 



MFETPHRIGA 
ADGNQSRGEM 
KITGEGKKAL 



TLADMAELFP 
VLVLYPAQDE 
YDLALSWKNK 



Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
G AAAAAC AC G 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

5 151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

spl P45528 | YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 

10 (F286) 

>gi I 606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi I 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length - 286 
Score = 218 bits (550), Expect = 3e-56 
15 Identities = 128/284 (45%), Positives - 171/284 (60%), Gaps = 4/284 (1%) 



20 



25 



35 



Query: 


4 


Sbjct : 


2 


Query: 


64 


Sbjct: 


60 


Query: 


124 


Sbjct: 


120 


Query: 


184 


Sbjct : 


180 


Query : 


243 


Sbjct: 


239 



KHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 
RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 



ALS AG+ F + GF+P KS RR ++ +E + HR+ +L 



30 " D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ 



EL A + +L AELP K+AA LAA+I G K ALY AL 



Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 77 

40 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C . GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

45 201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt ATCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk . AA tATCCC . GAT 

4 01 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

50 4 51 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA. 

651 GTTCATATCA TATTGCAAGT 

55 701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

7 51 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAG CAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AG G AAAAAT C AATGCCAAAC 
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10 



15 



20 



25 



30 



35 



40 



45 



951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 

2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 

3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



ATGAACACAA 
TTTAATGTTT 
AGGTGGTGTC 
CCTTTATTGA 
CAAGGTGCTG 
AAATAACGAA 
CCGTTACTTG 
GGCAAAGGCA 



TTCTCTGCCT 
CTTTATCCGA 
AACAGTTATC 
CGAAGGAAAA 
G AG G ATT AT A 
ACTTGGCAAG 
GAAAGTAAAC 
CGCTG 



AATAGATTAA 
GACAGCAAGA 
GACCCAGACT 
GGCGAATTGA 
TTTCCAAGGA 
GCGCGGGCGT 
GGCGTGGCAA 



AAACACGAAC 
GAACCTGTTT 
GAATAATGGA 
TACTTACCAG 
GATTTTACGG 
TCATATCAGT 
ACGACCGCCT 



CGTTCAATTG 
ATCATGCTGC 
GAAAATATTT 
CAACATCAAT 
TCTCGCCTGA 
GAAGACAGTA 
GTCCAAAATC 



// 



TGACTGCTTC 
GATCACGCTC 
TAGTGCAAAT 
ACGGCAACCk 
ACATTAAACG 
CGACCACGCC 
CAAACGTAAG 
GCAGTATTCC 
CAagGATACG 
GarCGGAATT 
TCCGCCTATC 
TGCGCCGCGC 
CACCGCCAAC 
AAATTGAACG 
CCGCAGCGAC 
TGGCGGTCAA 
GTAGTGGAAG 
CCTGCAAAAC 



ATTGACTAAG 
ATTTAAATCT 
GGCGATACAC 
TAgCCtCGtG 
GCAACACATC 
GTACAAAACG 
CCATTCCGCA 
ATTTTGAAAG 
G C ATT AC ACT 
AGGCAATTTA 
GCCACGATGC 
CGCCGTTCGC 
TTCGGTAGAA 
GT CAGGG AAC 
AAATTGAAGC 
CAATACCGGC 
GAAAAGACAA 
GAACACGTCG 



CCGCAACGCC 
CGCAAGATTT 
ATGCAGAAAA 
CCGGACCGAA 
CCCACGGCGC 
ATCAGnCGCG 
GAGsmAAAwT 
CGCGCCGgtt 
CtATTTCGTC 
CCCCCGGCCT 
TCATTCAAAC 
CTATACCGAT 
TATTGGCTCA 
GCCGAAATCA 
CCCGCAACTG 
GGTAA 



GTTTGGACAA 
CCGCGCCTAC 
ACCTCGGCAG 
AACACCTTCG 
CGTTTTCGGG 
GGCGCGGGTT 
CCGCCGCCGC 
tCggCGgATt 
CAAAAAGCGG 
TGCATTCAAC 
CGGCGCAACA 
GCCGCTTCGG 
GGATTTCGGC 
AAGGTTTCAC 
GAAGCGCAAC 



ACCGACATCA 
CACAGGGCTT 
GTTATACAGT 
G . sAATGcCC 
GGCTTCgGGC 
GCAGTCTGAC 
CTCAACGGTA 
CAGCCGCTTT 
TAAAAGACAG 
AACCTTGACA 
GGCAGGGGCG 
GCCGTTCGCG 
TCCCGTTTCA 
ATTCCGCTTT 
TGGCGGAAAG 
AACGAACCTG 
CAAACCGCTG 
ATGCAGGCGC 
// 

.... TTAGAC 
GCGGCATCCG 
CGCCAACAAA 
CGGGCGCGTC 
ACGACGGCAT 
CAATACGGCA 
TTAGCAGCGG 
GTGCtGCATT 
CGGCATCGAA 
ATTACCGCTA 
CGcTACCGCG 
CATTTCCATC 
GCAAAGTCCG 
AAAACCCGCA 
GCTGTCCCTC 
ACAGCGCGGG 



GCGGCAATGT 
GCCACACTCA 
CAGCCACAAC 
AAGCAACATT 
AATGCTTCAT 
GCTTTCCGGC 
ATGTCTCCCT 
ACCGGACAAA 
CGAATGGACG 
ACGCCACCAT 
CAAACCGGCA 
CCGTTCCCTA 
ACACGCTGAC 
ATGTCGGAAC 
TTCCGAAGGC 
CAAGCCTCGA 
TCCGAAAACC 
GTGG 



. . . GATAAAG 
CGATCTTGCC 
ACGGCAATCT 
GCCACCCAAA 
TAATCAAGCC 
TTAATCTAAG 
AACGCTAAGG 
AGCCGATAAG 
TCAGCGGCGG 
CTGCCGTCAg 
TACaCTCAAT 
GTGCGACAGA 
TTATmCGTTA 
GGTAAACGGC 
TCTTCGGCTA 
ACTTACACCT 
ACAATTGACG 
TTAATTTCAC 



CGCGTATTTG 
GGACACCAAA 
CCGACCTGCG 
GGCATCCTGT 
CGGCAACTCG 
TCGACAGGTT 
CAGCCTTTcA 
ACGGCATTCA 
CCGCACATCG 
CGAAAACGTC 
CGGGCATTAa 
ACGCCTTATT 
AACACGCGTC 
GTGCGGAATG 
CACGCTGCCG 
CATCAAATTA 



CCGAAGACCG 
CACTACCGTT 
CCAAATCGGT 
TTTCGCACAA 
GCACGGCTTG 
CTACATCGGC 
GACGGCATCG 
GGCACGAtAC 
GCGCAACGCg 
AATATCGCCA 
GGCAGATTAT 
TGAGCCTGTC 
AATACCGCCG 
GGgCGTAAAC 
CCGCCAAAGG 
GGCTACCGCT 



This corresponds to the amino acid sequence <SEQ ID 648; ORF1: 



50 



55 



60 



65 



70 



i 

51 
101 
151 
201 
251 
301 
351 
401 

701 
751 
801 
851 
901 
951 
1001 

1151 
1201 
1251 
1301 
1351 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGVQYI 
TKGHPYGGDY 
GRQYWRSDED 
KWLINGVLQT 
YSFNDDNNGT 
GGVNSYRPRL 
NNETWQGAGV 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
HMPRLHKXVT 
EPNNRESSYH 
GNPYIGKSNG 
GKINAKHEHN 
NNGENISFID 
HISEDSTVTW 



SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 



.... DKVTAS 
SHNATQNGNX 
LSGNAKANVS 
EWTLPSGXEL 
RSLLXVTPPT 
SEGTYTLAVN 
W 



IRFXAAYLAI 
IEVYNKKGEL 
NVDFGAEGXN 
DAEPVEMTSY 

IAS 

FQLVRKDWFY 
SLPNRLKTRT 
EGKGELILTS 
KVNGVANDRL 
// 

LTKTDISGNV 
SLVXNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SVESRFNTLT 
NTGNEPASLE 



CLSFGILPQA 
VGKSMTKAPM 
IXDQXRXTYK 
MDGRKYIDQN 

GS 

DEIFAGDTHS 
VQLFNVSLSE 
NINQGAGGLY 
SKIGKGTL. . 



WAGHTYFGIN 
IDFSWSRNG 
IVKRNNYKAG 
NYPDRVRIGA 
PMFIYDAQKQ 
VFYEPRQNGK 
TAREPVYHAA 
FQGDFTVSPE 



DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 



TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 



RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS 

RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF 

XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD 

PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG 



. LDRVFAEDR 
GRVGILFSHN 
SSGSLSDGIG 
YRYENVNIAT 
KVRTRVNTAV 
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14 01 • LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

4 01 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AG CAGGGACT 

4 51 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAG G AAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAAT CAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AAT GGAG AAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

14 01 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 

14 51 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT T C AAAAT AC C 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 

1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 

1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 

2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAG C GG AAA 

2 051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 

2351 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 

24 01 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CAT T AAAC G G 

2451 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 

2 651 CAT T AC ACT T AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 CCACGATGCG GCAGGGGCGC AAAC C GG C AG TGCGACAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 

2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 

2 901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 

2 951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 

3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 

3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 

3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 

3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 

3201 CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 

3251 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 

3301 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
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3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

34 01 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

34 51 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

3 551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3 651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3 901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 
3951 CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 

4 001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 
4 051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 
4101 GGGCATTAAG GCAGATTATT CAT T C AAAC C GGCGCAACAC ATTTCCATCA 
4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 
4 201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 
4 2 51 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 
4 301 ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 
4 351 AT C AAAT T AG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NT FAQNG SGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 

351 LPNRLKTRTV QLFNVSLSET ARE PVYHAAG GVNSYRPRLN NGENISFIDE 

4 01 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

4 51 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT .: 

551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

7 51 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 

1001 NTGNEPASLE QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

1101 VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHG AV FG' QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

14 01 TRVNTAVLAQ DFGKTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

1451 IKLGYRW* 

Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) from strain A of N. 



10 20 30 40 50 60 

MKTTDKRTTETHRKAPKTGR IRFXAAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 
I M I I I I I ! I I I I I I I II I I I M ( I I I I I I I I $ II I It I I I I I I I I I II I I I I I I I I i 
MKTTDKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKD IEVYKKKGELVGKSMTKAPM I DFSWSRNGVAALVGVQYI VSVAHNGGYN 
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15 



20 



orf la 

orf 1 . pep 
orf la 

orf 1 . pep 
orf la ' 

orf 1 . pep 
orf la 



I I I I I I I I I I I I II I I I I I I I I I I I I I I ! II I I I II I I I I I M I I I I I I I I I M M II I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 110 120 

130 140 150 160 170 180 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

I I I I I I II I I II I : I : I I II I I I I : : 111:11 I I I I I II I 1 I I I II I I I I 
NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 
MDGRKYI DQNN YPDRVRI GAGRQ YWRS DEDEP NN 

II I I : : : I 1 : I I I I I : I : : I I I I : I : II 
MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 

180 190 200 210 220 230 

220 230 240 250 260 

RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I : : : : I I I I I I I I II I : : I M : I I I I I I I II I : II 111 = 14 

SGDVRHAN DYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 



25 



270 280 290 300 310 320 

orf 1 . pep DWFYDEI FAGDTHSVFYEPRQNGKYS FNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

11111:1: 1111:1 : M I : I I : : I I : : : I I I I I : : : I : 1 I : I I : : II : I I : 
orf la DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
• 300 310 320 330 340 350 



30 



35 



40 



330 340 350 360 370 380 

orf 1 . dgd SLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 

I i : I I : I I I I I I I I I I : I I M I I I I II : I I I I I : I : I I I : : I I I I I I I I I II : I I I I 
orf la SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 

360 370 380 390 400 410 

390 400 410 420 430 

orf 1 . pep VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

I I I I I I II I I I I M I I I I I I 1 I 1 I I I II I I I I I I I I I I I M I 
orf la VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



45 



orf 1 . pep 
orf la 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



50 



orf 1 . pep 
orfla 



RIQNTDEGAMIXXHNATTTSTVT ITGNES ITQPSGKNINRLNYSKE I AYNGWFGEKDTTK 
540 550 560 570 580 590 



55 



orf 1 .pep 
orfla 



TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
600 610 620 630 640 650 



60 



65 



70 



orf 1 .pep 
orfla 

orf 1 .pep 
orfla 

orf 1. pep 



IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

440 450 460 470 480 

XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: I I : III M I I I I 1 II I I 1:1 I : I I I I I I I 

TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 

490 • 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
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| | | M I i I I I I I I I I I Ml I I I 1 1 I I I I i I I l I : I I I M I I I I I : : I : II M I I I I 
GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 

780 790 800 810 820 830 



orf la 

10 



550 560 570 580 590 600 

orfl pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 

' ' | | | || | | | | | | | 1 || | I I II I I I I I I : I I I I I I : I I : I I I I I I I I M I I I I I I : I M M 

NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 

840 850 860 870 880 890 



610 620 630 640 650 660 

orfl pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
| | | M I II I I II M II I I I I I I I :: I : I II I I I I I II I I I I I I II I I I I I I I I I I 

orf la NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFNTLTVNG 

15 900 910 920 930 940 950 

670 680 690 700 710 720 

orfl pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 
Ml I I I II I I I I I I I I M I I I I I I I I I I M I I I I I I M I I I I : II : II I I I I I I I I II I 
20 orf la KLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEGKDNKPL 

960 970 980 990 1000 1010 

730 740 750 

orfl . pep SENLNFTLQNEHVDAGAW 

25 ^ i | I I 1 I I II I I I I I I I I I I 

o^fla SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 

1020 1030 1040 1050 1060 1070 



30 orfl. pep 



or f la LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 

35 760 

orfl. pep . LDR 

II I 

o^f la XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 

40 

770 780 790 800 810 820 

o^f 1 peD VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
I | | I | I I I I I I I I II M M I M I I I II I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
orf la VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
45 1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

orfl . pep T FDDG I GN S ARLAHGAVFGQYG I DRFY I G I SAGAG FS SG S LS DG I GXKXRRRVLH YG I QA 

: M I I M I I I I M I I I I I I I I I I II I i I i : I I I II I I MUM I I I I I I I I I I I I 
50 orf la XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 

1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

orfl . pep RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 

55 * ' I II I I I I I I I II I : U I I I I I I I I I II It I U I I I I I I I I I I I I II I I I I I M I I I I I I 

orf la RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 
1320 1330 1340 1350 1360 1370 

950 960 970 980 990 1000 

60 orfl . pep SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 

Mill I ! I I M I II I M M M I M I I II I I II I I M II M I II II I 1 I I 1 I 

orf la SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 



65 1010 1020 

orfl. pep QLEAQHSAGIKLGYRWX 
I I I I I I I I I M II M M 
orf la QLEAQHSAGIKLGYRWX . . . 

1440 1450 

70 The complete length ORF la nucleotide sequence <SEQ ID 651 > is: 
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1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

5 201 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

4 01 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

10 4 51 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

15 701 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

7 51 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 
801 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 

8 51 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATT GG 
901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 

20 951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

25 1201 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 

1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 T CC AAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

14 01 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

30 14 51 ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 

1651 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

35 17 01 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 

17 51 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 

18 01 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 
18 51 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 
1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 

40 1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 

2001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 

2051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 

2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 

2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 

45 2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT T G ACT AAG AC NGACNTNAGC 

2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 

2 301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 

2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 

24 01 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACN CAT CGG NTTCGGGCAA 

50 24 51 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 

2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 

2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 

2 601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 

2 651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 

55 27 01 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 

27 51 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 

2 801 TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 

2 8 51 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 

2 901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 

60 2 951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 

3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 

3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 

3101 AACTCATCCG CAAAG AC GG C GAGTTCCGCC TGCATAATCC GGTCAAAGAA 

3151 CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 

65 3201 AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 

3251 CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 

3301 GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 

3351 GGATAAAGAC AGCGCNTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 

34 01 NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 

70 34 51 CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 

3501 CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 

3551 TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

37 51 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4 001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

4201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4 251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4 301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 652>: 



1 MKTTDKRTTE THRKAPKTGR 

51 YQYYRDFAEN KGKFAVGAKD 

101 VAALVGDQYI VSVAHNGGYN 

151 SHPYNGDXHM PRLHKFVTDA 

201 HYWRYDDDKH GDLSYSGAWL 

251 MPIAGAAGDS GSPMFIYDKT 

301 FYDDIYRGDT HTVXFEPRSN 

351 TVRLFDESLN ETDKE PVYAA 

4 01 NINQGAGGLY FEGDFTVSPE 

4 51 SKIGKGTLHV QAKGENQGSI 

501 RGTVQLNADN QFNPDKLYFG 

551 NATTTSTVTI TGNESITQPS 

601 LN LVYQPAAE DRTXLLSGGT 

651 WSKMEGIPQG EIVWDNDWIX 

7 01 SNHAQAVFGV APHQSHTICT 

7 51 GXVXLXXXXX XXLXGXAXLX 

8 01 ATFNQATLNG NXSXSGNASF 
8 51 VSLADKAVFH FENSRFTGQL 
901 ATITLNSAYR HDAAGAQTGX 
951 VNGKLNXQGT FRFMSELFGY 

1001 QLTWEGKDN KPLSENLNFT 

1051 QELSDKLGKA EAKKQAEKDN 

1101 ENVGIMQAEE EKKRVQADKD 

1151 PQPQPQPQPQ PQRDLXSRYA 

1201 NAVWTSXIRX TKHYRSQDFR 

1251 TENXFDDGIG NSARLAHGAV 

1301 KIRRRVLHYG IQARYRAGFG 

1351 GLAFNRYRAG IKADYSFKPA 

14 01 AQDFGKTRSA EWGVNAEIKG 

A transmembrane region is underlined. 



IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 
IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 
NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 
EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 
IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 
NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 
GHFS FTSNNN GTGTVTETNE KVSNPKLKVQ 
GGVNQYRPRL NNGENLSFID YGNGKLILSN 
NNETWQGAGV HISEDSTVTW KVNGVANDRL 
SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 
FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 
GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 
NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 
RTFKAENFHI QGGQAVISRN VAKVEGDXHL 
RSDWTGLTNC VEXXITDDKV IASLTKTDXS 
GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 
NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 
SGSKXTALHL KDSEWTLPSG TELGNLNLDN 
VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 
RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 
LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 
AQSLDALIAA GRDAAEKTES VAE PARXAGG 
SALAKQREAE TRPXTTAFPR ARXARRDLPQ 
NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 
AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 
FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 
GFGIEPYIGA TRYFVQKADY RYENVNIATP 
QHXSITPYXS LSYTDAASGK VRTRVNTAVL 
FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 



ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

orfla.pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
M | | | | | I | I II I I I I I I I I M I I I 1 M I I I M I I I I I I I II I I I I I I I I M I I I I I I I I 
orf 1-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGIN YQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf la .pep • KGKFAVGAKD IEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQY I VSVAHNGGYN 

I M I 1 M I I I I I M 1 I I I I I II I I I I I M I I I I I M I I II I I I I I I I I II M I M I I I I I 
orf 1-1 KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQY I VSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXN PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTS DM 

I || I I I II 1111111:1:11111111 : : I M : I I I I II If I I 1 I I 1 II I 1 I II I 
orf 1-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 
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orfl-1 

orf la .pep 
orfl-1 

orf la . pep 
orfl-1 

orf la . pep 
orfl-1 

orf la . pep 
orfl-1- 

orf la . pep 
orfl-1 

orf la .pep 
orfl-1 

orf la .pep 
orfl-1 



180 190 200 210 220 230 
RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL — SYSGA WLIGGNTHMQGWGNN 

I | | : : : I I : I I I I I : I : : I 11 I : I : : : I I I 11:1111 ' : : : : 
DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

190 200 210 220 230 240 

240 250 260 270 280 290 

GVXSLSGD-VRHANDYGPMPIAGAAGDSGSPMFIYDKTNNECWLLNGVLQTGYPYSGRENG 

| : : | : :: : : I : II : I : I : I I I I I I I I I I I :: I I I : I I I I 1 I I M I : II 
GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNG 

250 260 270 280 290 

300 310 320 330 340 350 

FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 

| I I : I M I I I I : I : I I I I : I : I I I : I I : : I I : : : M I M : : : I : I I : I I : : I 
FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 

300 310 320 330 340 350 

360 370 380 390 400 410 

VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 

| : I I : ||:|| Mill I I I 1 I I : I II I II I I M : I I I I I : I : M I : : I I I I I M M I 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 

360 370 380 390 400 410 

420 430 440 450 460 470 

FEGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 

I : I I | I | I I M II I I I I I I I I I I I I I I I I I I I M I I I M M I I I I I I I I M I I I I I I 1 I I 
FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 

420 430 440 450 460 470 

480 490 500 510 520 530 

SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

I I M I I I I M II I I I I I I I I I I II I I I I M II I I 1 1 I I I I I I I II I M I I I I II I I I M 
SVGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
480 490 500 510 520 530 

540 550 560 570 580 590 

HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

I I M I I I I I I I M I I I I II I I ! M I I I : : I : : I : I I I : : I M II II II I 

HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIAT-TGNN-NSLDSKKEIAYNGWFG 

540 550 560 570 580 590 

600 610 620 630 640 650 

EKDTTPCTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I I I I I I I I I I II I I i I I I I I I I I I I I I I I I I I M II I I II I I I I I I I I I I I M M : : 
EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 
600 610 620 630 640 650 

660 670 680 690 700 710 

WSKMEG I PQGE I VWDN DW IXRTFKAEN FH I QGGQAVI SRNVAKVEGDXHLSNHAQAVFGV 

It: I I M : I I II I 1 I I I I II II I I I I : I : I I I I I : I I I I I I I : M I I I I I I I I I I I I 
WSQKEGIPRGEIVWDNDWINRTFKAEN FQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGV 

660 670 680 690 700 710 

720 730 740 750 760 770 

APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 

M I I I I I II II I M I I I I I I I I : I I I I I I I I i t I II I II I I Ml I • I 

APHQSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLN 

720 730 740 750 760 770 

780 790 800 810 820 830 

GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNG 

M II I I I I I I I M I I I I I I I I II I I I I I I U I I I I I I I I I I : I I I I I I I I I I I : i I I 
GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNG 
780 - 790 800 810 820 830 

840 850 860 870 880 890 

SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 
I I I I I I I I I I I I I I I I 1 I I I M M 1 11 1 I I I : 11 1 I 1 I : I I : I t I 1 I I I I M I I I I I I 
SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
840 850 .860 . 870" . 880 890 
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900 910 920 930 940 
orfla pep TE LGN LN LDN AT I TLN S A YRH D AAG AQTGXVS DT PRRRS RR S LLSVTPPTSVESRFN 

I I I M I M ! I M I I I I I I I II M I i I I M : : I : I M f I I I I I II I I I I I I M M I I 
orf 1-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 

900 910 920 930 940 950 

950 960 970 980 990 1000 

orfla . pep tltvngklnxqgtfrfmselfgyrsdklklaessegtytlavnntgnepvsldqltweg 

I M I I i I I I I I I II M I I II I I I II I I I I I M I I I I I I M I I I I I I I I : 1 I : I I I I I II 
o r f 1 - 1 TLT vngklngqgt frfmselfg yrs dklklae s segt ytlavnntgne pas leqltweg 

960 970 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

orfla . pep kdnkplsenlnftlqnehvdagawryqlirkdgefrlhnpvkeqelsdklgkaeakkqae 
I I I I I I I I 1 II I M II I I II I I I I I I I I I I I I I I I II I I I I I \ I I I I I I I I I I I M I I I I 
orfl-1 kdnkplsenlnftlqnehvdagawryqlirkdgefrlhnpvkeqelsdklgkaeakkqae 

1020 1030 1040 1050 1060 1070 

1070 1080 1090 1100 1110 1120 

orfla . pep kdnaqsldaliaagrdaaektesvaeparxaggenvgimqaeeekkrvqadkdsalakqr 

I I 1 I I I I I I I I I I I I I I : I I I I I I I I I I I ' I I I I I I I I II I I I I M I I I I I M : I I M I I 

orfl-1 kdnaqsldaliaagrdavektesvaeparqaggenvgimqaeeekkrvqadkdtalakqr 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orfla . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 

II I I I 1 II 1 II I I I I I I I I I I MINIM MM II M M I I I M I I M M I I M 
orfl-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP— QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orfla . pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
M I I I M M II I M I I I M II II II M M II II I I M M M I I I I I I II I M M II M 
orfl-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orfla. pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I M I I I M II II II II II M I i I I I I I M M I II I M M M M I I I II I II M II I I 
orfl-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orfla. pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
I II I I I II I I II M II M I : I II I I II II I II M M II M I I M I I I I I II I II M M M 
orfl-1 HYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
1320. 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orfla . pep KPAQHXSITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 
I II I 1 I I II I I M M M II I II II II I II II II I II I I II M I I I I I I M II M I II 
orfl-1 . KPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHA 

1380 1390 1400 1410 1420 1430 



1430 1440 1450 

orfla . pep AAAKG PQLEAQHS AG I KLG YRWX 

I II I M I II II I II II I II M I I 
orfl-1 AAAKG PQL.EAQHSAG I KLG YRWX 

1440 1450 



Homology with adhesion and penetration protein hap precursor of H.influenzae (accession number P45387) 
Amino acids 23-423 of ORF1 show 59% aa identity with hap protein in 450aa overlap: 

orfl 23 FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 

F +L C+S GI QAWAG HT Y FG I + YQY YRD FAENKGKF VGAK+ IEVYNK+G+LVG 
hap 6 FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

orfl- 83 KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMIDFSWSRNGVAALVG QY I VS VAHNGG YN +VDFG AEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 



.9924576A2_I_> 



WO 99/24578 



-368- 



PCT/IB98/01665 





Dili 


14 3 

X *i J 




hap 


125 


5 






orn 






hap 


IOC 

lob 


in 


on j. 


Z Z J 




hap 






orf 1 


278 


15 








hap 


305 




orfl 


335 


20 


hap 


364 




orfl 


394 




hap 


424 


25 


Amino acids 715 




Orfl 


41 




hap 


733 




orfl 


99 




hap 


793 




orfl 


159 


35 








hap 


853 




orfl 


219 


40 


hap 


900 




orfl 


279 




hap 


960 



KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 
KRNNYQAWERKHPYDGDYHMPRLHKFVTEAE PVGMTTNMDGKVYADRENYPERVRIGSGR 184 

QYWRS DE DEPNNRE SSYHIA 222 

QYWR+D+DE N SSY+++ 



-SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRKDWFYDEIFAGDTHSVF 277 
SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 



Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 



A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 



GV I +D+TV WKV+ NDRLSKIG GTL 



DT RYT VS HNATQ-NGNXS LVXNAQAT FNQ- AT LNGNTSAS GN AS FN LS DHAVQNGS LTLS 98 
DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 



GNAKANVSKSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 
+A A V++-.- LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 



L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

LTLNNSTVTLNSAY SASSNNAPRHRRS LETETTPTSAEHRFNTLTVN 8 99 



GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 



LS+ L FTL+N+HVDAGA 
LSDKLKFTLENDHVDAGA 977 

45 Amino acids 1 192-1450 of ORF1 show 41% aa identity with hap protein in 259aa overlap: 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 
LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 
LDRL FVDQAQS AVWTN I AQDKRR YDS DAFRAYQQKTNLRQ I GVQKALANGRI GAV FSHSR 1194 

50 orfl 61 TENT FDDG IGNS ARLAHGAVFGQYG I DRFYXXXXXXXXXXXXXXXXX I GXKXRRRVLHYG 120 

++NTFD+ +NAL+FQY K R+ ++YG 

SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 

I QARYRAG FGG FG I E PH I G ATR Y FVQKAD YR YEN VN I AT PGLAFN RYRAG IKADYSFKPA 180 
55 + A Y+ G GI + P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 
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hap 
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1195 
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121 
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1255 


orfl 


181 


hap 


1315 


orfl 


241 


hap 


1375 



QHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 24 0 
+ IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 



KGPQLEAQHSAGIKLGYRW 259 
+G QL Q + G+KLGYRW 
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Homology with a predicted ORF from N. gonorrhoeae 

The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 
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orf 1 .pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 -pep 
orf lng 
orf 1 .pep 
orf lng. 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 -pep 
orf lng 
orf 1 .pep 
orf lng 
orf 1 .pep 
orf lng 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAI CLS FG ILPQAWAGHT YFG INYQYYRDFAEN 60 
| | | | M I 1 I I I M t I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 60 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 120 
| | | | | | | | I | | | U I I I I I I I I II I I I M I I I I I I I I I I I I I II : I I I I I I I I I I II II 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHKGGYN 120 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 

I | | t I ) I I I II I : I : II I M I I I I I I : I I 1 I II M I I I I I I I II II I I I I II II 

NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 17 9 

MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDE PNNRESSYHIAS 223 

Ml || I I : I II I I I I II I I) I I I I II I I I I I I I I I I I I I I 

MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 239 

GS PMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 255 

I I I I I I I I I I M I I I I M I I I I I M I I I I I II 

GGTVNLGSSKIKHSPY GFLPTGGSFGDSGS PMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 28 9 

FOL VRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 

I | II | I I I M I I I I I I I I I I II I II : I I I I I I I I : i I t : i I I : i I I : I ill II II I I 

FOL VRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 
I I I II I I I II I I II M I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I I I i I I M I I 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY • 

FQGDFTVS PENNETWQGAGVHI SEDSTVTWKVNGVANDRLSKIGKGT 4 22 
I : I : i M I ! : I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I II I I 

FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 4 7 9 

// 

DKVTASLTKTDI SGNVDLADHAHLNLTGLA 7 4 4 
III II I : I I I : I II : I 1 I I I I I I I II I I 

FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 77 4 

TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 803 
I : I I i I : : : : I I : I I I I I I I IN I II I I I I II I I I I I I I I I I I I I I I I :: I 

TFNGN L- VQAETRT IRLRANATQNGNLSLVGN AQAT FNQATLNGNT SAS DNAS FNLSNNA 833 

VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 8 63 
I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I II : I I I M : I II ! I II I M I I I II I I 

VQNGSLTLSDNAKANVSHSALNGNVSLADfCAVFHFENSRFTGKISGGKDTALHLKDSEWT 8 93 

LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 923 
I I I I : I I I I I I I I M I I I M I I M I I I II I I I I I I : I I I I I I I I I I II I II I II : I 

LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 98 3 

I 1 I I I I I I I I I M I I I I I 1 I I I I I I I II I I I II I I I I I 11 I I I I I I I I I I I I : I I M I I 
SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 
Mill!! I I I I I I I I I I I I I I M I M I 

WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHN PVKEQELSDKLGKAGET 107 0 

// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
I I I I II I I I I I I It I I M I I I I I i I I I I I I 

PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 123 9 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 

II | | I I | 1 I I II I I II I I I I 1 I I II I I II II I I I I I I I I I I I I I I I I I 1 I II M I II 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 12 99 
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orf 1 . pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

II M I I I I I I I I I I 1 I ! I I I I I I I II 1 II I I M I I I I I I I I I I I I I I M I I I I I I I I 

orf Ing I G I S AG AG FS S G S L S DG I RGK I RRR VLH YG I QAR YRAG FGG FG I E PH I G ATRY FVQKAD Y 1359 

orf 1 .pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 

I I I I II I I I I I I I I II I I I I II I I I I I I I I I I I I I I II I I I I I II I M I I I I I I I I I I I I 

orf lng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGBCVRTRVNTAVL 1419 

orf 1 - pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 4 0 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I 

orf lng AQDFGKTRSAEWGWAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 68 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 

1 AT G AAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 

51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

2 51 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

4 01 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 

4 51 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

7 01 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

7 51 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 

12 51 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 
1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

13 51 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

14 01 GCTGGTT CAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 
14 51 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 
1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 
1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 
1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 
1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 
1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 
17 51 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 
1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 
1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 
1901 C G C AAAC AAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 
1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 
2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 
2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 
2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 
2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 
2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 
2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 
2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 
2 351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 
2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 
24 51 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 
2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 
2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 
2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 
2 651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 
27 01 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 
2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 
2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 
2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
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2901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2 951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 
3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 
3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 

5 3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 

3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 

3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 

3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 

3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 

10 3^51 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

34 01 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 

34 51 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 

3 501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 
3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 

15 3 601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 

3651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 

3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 

37 51 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 

38 01 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 
20 3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 

3 901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3 951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 
4 051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 

25 4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 

4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 

4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 

4 251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 

4 301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 

30 4 351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 

4 4 01 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRKAPKTGR IRFS PAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

35 101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GG5FGDSGSP MFIYDAQ KQK WLIN GVLOTG NPYIGKSNGF : 

301 QLVRKDWFYD E1FAGDTHSV FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

40 351 LPYRLKTRTV QLFNVSLSET ARE PV YHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPKN NETWQGAGVH ISDGSTVTWK 

4 51 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

45 601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

751 LSKT DVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKANVS 

50 851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 
951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT WEGKDNTPL SENLNFTLQN EH V DAG AWRY QLIRKDGEFR 
1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

55 1101 AGRNATEKAE SVAE PARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 
1301 GISAGAG FSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 
60 1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 
1401 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 
14 51 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



65 ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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10 20 30 40 50 60 

MKTTDKRTTET HRKAPKTGRI RFS PAYLAI CLS FG I LPQAWAGHT YFG IN YQYYRDFAEN 

M I I I II I I M I I I I I I I I I I II I I I I II II I I I I I II I II M I II M II I I I I 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
| | | f 1 | I I I I I I M I M I M I I I I i I I I I I I 1 I I I I I I I I I I I I : I I I I I I I M I I M II 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 180 

NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
I I I I } I I 1 M I I M I : I :! I I II I I I I I I = I I I I I II I I I M I I I I I I I I I I N I M I I 
NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMFRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

190 200 210 220 230 240 

DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

M I I I I : M I I I I I I I I I I M 1 I t I II I M II II I II M I I I I I I Ml I I I I I I I II 
DGWKY ADLNKY PDRVRIGAGRQYWRS DE DE PNNRE S S YH I AS AYS WLVGGNT FAQNG SGG 
190 200 210 220 230 240 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I M I M I I I I I I I I I I I I M I I I I I I I I M I I I I I I M I I I I I I I I I M I I I I I I I I I I I 
GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 

310 320 330 340 350 360 

QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

| | | M I I I I 1 1 I I II I I I I I I I I I : I M I I 111:111:111:111:1 Mi I I I I I I I 
QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

310 320 330 340 350 360 

370 380 390 400 410 420 

QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 

Mil [ I I I II I I I I I II I I I M M I M II I I I I : I II M II M II M I II I I I I I 

QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLYF 
370 380 390 400 410 420 

430 440 450 460 470 480 

QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 

: | : | ! | | I : II I I I M M II II : I I I 1 M II I M I M I M M II M M I I I I I M I : I 
EGN FT VSPKNNETWQGAGVH I SDG ST VTWKVNGVANDRLSK I GKGTLLVQAKGENQGSVS 
430 440 450 460 470 480 

490 500 510 520 530 540 

VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH • 

INI I I I I II II I : M I I I II I M I I I M II I I M II I II I M II I I I I II I I I II II I 
VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
490 500 510 520 530 540 

550 560 570 580 590 600 

SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 
i | M I I I I M I I I I I M I II I I I I I 1 I I I I I I I I I : I I I I 11 : I I I i I I M I M M M I I 
SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 
550 560 570 580 590 600 

610 620 630 640 650 660 

TTKTNGRLKLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 

: | | | M I M I Ml M II I I II I I I I M M 1 I M I I M M II M I 11 II M : : I I : 
ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 
610 620 630 640 650 660 

670 680 690 700 710 720 

KEG I PRGE I VWDN DW INRT FKAEN FQI KGGQAW SRNVAKVKGDWHLSNHAQAVFGVAPH 
M M : I I I II II M I : 1 I M I I M : I : I I I M M I I M I I : M M I I M M II I M M I 
MEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 

670 680 690 700 710 720 
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730 740 750 760 770 780 

OSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 

■ i I I I I I I | | | || ! I : I : I I I M I I I I I I M : I I I I I M : M 1 I I 1 M I M I I I M M I 
OSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 

I I : t | | : M 1 :: | I M 1 I I I I I I I M I I I M I 1 I I I M I I I I M I I I M : : I I I I I M I 
SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 

790 800 810 820 830 840 

850 860 870 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

M l | I | | | I I I I I II M I! I M I I I I I I : M I I I : I I I I I M I I M M M I I I I I I I I I 
LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
M | I 1 I I I II I M 1 I 1 I M 1 I I I I I : t I I I I I I I I I M I I I I M I : I I I I I I I I 



I 1 

GN LN L DN AT I T LN SAY RH D AAG AQT G S AAD AP RRRS R 
910 920 930 



RSLLSVTPPTSAESRFNTLT 
940 950 



970 980 990 1000 1010 1020 

VNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDN 

I | | | | | | | I M I I I II I I M II I M M I I I I 1 M I I I 1 U M I II : I I I I M I I I M I I 
VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 
KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHN PVKEQEL.SDKLGKA 

I 1 M I I I I M I It I I 1 M I 1 I I I I I II I I I I II I I 1 I I I 

^PLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 

1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAE PARQAGGENVGIMQAEEEKKRVQ 

I | : I II I I I I I I I I H I I M I : I : I I : I I M M I I I I I I I I : I I I 1 I I I M I I M 
QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVQ 

1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 

[MM!! I I I I II I I I I I I I I I I I I III I I I I I I M M II I I II I I M I I I I I I 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
| | | I 1 I I I I I I 1 11 11 I I 1 I M I II I I I I I I I I I t I I M I I II M II I M I I I ! I I M I I 
ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 

I I I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M 1 I I I I I I 
SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

GGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 

I I I I I I I I I I I I I I I I I II I I I M I I I I II I I I I i 1 I 1 1 I I I ! I I I I I I I I I I i 

RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVIAQDFGKTRSAEWGVNAEI 
| I M I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II I I I I I I I 
AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 
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1430 1440 1450 

orf 1-1 . pep KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
I I I I I I I I I I I I I I M M I I I I I I I I M I II I 
o r f 1 ng - 1 KGFTLS LHAAAAKGPQLEAQHS AG IKLGYRWX 

1440 1450 1460 



10 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 

SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 



10 20 30 40 50 60 

orf lng-1 . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

| : I : I : I : I I : I I I M I I I I 1 : I I I I 11 I I I I 
P 4 5387 MKKT VFRLN FLT AC I S LG I VS QAWAGHT Y FG I D YQ Y YRD FAEN 

15 y 10 20 30 40 

70 80 90 100 110 120 

o^f lng-1 . peD KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 
I I I I : I I I : : I : I I I I : I : I I I I M I I I I M M I I I I I I I I I I : : I I I I I I I I I II: 
20 p4 5387 KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 

50 60 70 80 90 100 

130 140 150 160 170 180 

orf Inq- 1 pep NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
25 : M I I I I I : I I I M M : I : I I I I I I I I I Ml I I I M I I I I II : I I : : I M I 

p4 5387 DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 
110 120 130 140 150 160 

190 200 210 220 230 240 

30 o-f lng- 1 . oep DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

: I I : I : I t I : I I I I I : I I I : I 1 : I : I : : : : I : I I : I : : I I I I I : I : 

D4 5387 NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

35 250 260 270 280 290 300 

orf lng-1 .oeo GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I 1|:: | : I I I I : I I I I M I M I I I I I : I I I I I I I I : I : III: I I III 
p4 5387 GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 
220 230 240 250 260 270 

40 

310 320 330 340 350 360 

orf lng-1 - pep QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
I M I I : : I M II ! I : : I I 1 I : : ! : I i I : I I : : I : : I : 

p4 5387 QLVRKSYF-DEIFERDLHTSLYTRAGNGVYTISGNDNGQGS ITQKS GIPSEIK 1 

45 * 280 290 300 310 320 

370 380 390 400 410 419 

orf lng-1 .pep QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
I I : I ] : : I : : I I I I I I I I I I :: I : I :: I I I :: I : I I I I I I I II 

50 p45387 TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 

330 340 350 360 370 380 

420 430 440 450 460 470 479 

orf lng-1 .pep FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
55 M I II M I I : : I : I I I I I I : I : I : : I M I M I II I : I II I I M I I I I I I I I I I I : I I : 

p4 5387 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 

390 400 410 420 430 440 

480 490 500 510 520 530 539 

60 orf lng-1 - pep SVG DGKV I LDQQADDQGKKQAFSE I GLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

I | I | I I II I : I I I I I 11 : I I I I I I I I M I I I II I M I I : I I : II : I I I I I M I I I I I I 
p4 5 3 8 7 SVGDGKV I LEQQADDQGNKQAFSE I GLVSGRGTVQLNDDKQFDT DKFYFGFRGGRLDLNG 

450 460 470 480 4 90 500 

65 540 550 560 570 580 590 

orf lng-1 . pep HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
I I t : I : M M I II I I I I I I M : : : I 1 I I 1 I : : I : -III I : I I : I I I I 1 I I I I I 
p4 5387 HSLTFKRIQNTDEGAMIVNHNTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 
510 520 530 540 550 560 

70 
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600 610 620 630 640 650 

orflna-1 Dep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

9 P P | | :| | || || I 1:1 lllll Mill I I: MM II: II I II III II Mr: 

^4^87 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
5 P 570 580 590 600 610 620 

660 670 680 690 700 710 

orflna-1 Deo WSKMEGI PQGEIVWDNDWI DRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 

9 ' P P ||: I M I M I : I M = M I M I II : 1 : M : I 1 M I I 1 : : : I t r I 

1 0 n4 5387 WSEMEGI PQGEI VWDHDWINRTFKAENFQIKGGSAVVSRNVSS IEGNWTVSNNANATFGV 

1U P 630 640 650 660 670 680 

720 730 740 750 760 770 

orflna-1 pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 
15 9 :i:|::MIIMMtlll:l : =11 Mi I: MM I: III M 

o4S3R7 VPNQONTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 
P 690 700 710 720 730 740 

780 790 800 810 820 830 

20 orf lng-1, pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNAS^LSNNAVQNG 

p4 5387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 

?5 840 850 860 870 880 890 

orf ina-1 pen SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 
9 ' P P :: MM: Ml::: Mill IMM I ::ll:l: MM I I?': I::: IMM 



: : lilt: | : j : : : Milt I : I : 1 I ; ; i i - i • - i ■ i ■ m • » 1 » 

p4538 7 NiRLSDNSTATVDNANLNGNVHLTDSAOFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 

780 790 800 810 820 830 



900 910 920 930 940 950 

o-flna-1 pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 
" I | I I : I : I : I I I II I I I : : I : :: II I i I I : I M I I I II M M 

p4 5 38 7 TTLQNLTLNNSTITLNSAY SASSNNTPRRRS LETETTPTSAEHRFNTLT 

35 840 850 860 870 

960 970 980 990 1000 1010 

orf lng- 1 pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 
| M II : I M M : I I M I I : I I I I I : : : : I I I Ml 111:11 = M II I : I I : M I 
40 p4 5387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 

r ' ' 880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orf lnq-1 pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
45 ^ I M : : I : II I : I : II M II I I : I : : : M M I I I M : I M M : I : I : : I : I M 

p4 53 8 7 Q P L S DKLKFT LEN DH V D AGALRYKLVKN DGE FRLHN P I KEQELHN D LVRAEQAERT LEAK 

940 950 960 970 980 990 

1080 1090 1100 1110 1120 1130 

50 orf lng- 1 pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEKKRV 

I : : : I I I : : : : : I I II : : : : : I Mil M : : : : : M I 
P45387 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE-LTAETQKSKAKTKKV 
1000 1010 1020 1030 1040 1050 

55 1140 1150 1160 1170 1180 1190 

orflna-1 pep OADK DTALAKQREAETRPATTAFPRARRARRD- LPQPQPQPQPQPQRDLI SRYANSG 

:: : : I I : I :: :::::! I I : : I : I : M I M M I M 

p4 5387 RSKRAVFSDPLLDQSLFALEAALEVIDAPQQSEKDRLAQEEAEKQ-RKQKDLISRYSNSA 
1060 1070 1080 1090 1100 1110 

60 

1200 1210 1220 1230 1240 1250 

orf lng-1 pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 
M I : I II : I M : : I M M M : I : : : : : I I M : : I • : I M II I M M I : I II II 
p4 5387 • LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 

65 ^ ~~ 1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 
: || 1 : : M : I : II I : 1 : MM: : I I M : M M I : : : I : : : I : I : I : : 
70 p45387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 

1180 1190 . 1200 1210 1220 1230 
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1320 1330 1340 1350 1360 1370 

orf lng-1 . pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 
: : : : I I : I : : : : ! I : : I I : : I : I I : I : : I : : I I I : : : : I : I : I : I I : I 

P 4 5387 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 

1380 1390 1400 1410 1420 1430 

orf lng-1 . pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW ; 

MM! I II : : I M i I : : : I I : I I : : : I : I : : : : : I : I II Ml I II: : I 
p4 5387 AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

orf lng-1 . pep G VN AE I KG FT LS LHAAAAKG PQLE AQH SAG I KLG YRWX 

I : : I M I : I : : M M I : : : I : I j I | II 
p4 5387 GLKAEILHFQISAFISKSQGSQLGKQQNVGVKLGYRW 
1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 78 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 655>: 

1 . . AAGGTGTGGC AATTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg . ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

2 51 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 
301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

3 51 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

4 01 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 ..KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 

101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKV FGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 65 7>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

2 51 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ED 658; ORF6-l>: 

1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 

101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N, 
meningitidis: 

10 - 20 30 
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10 



15 



orf 6 . pep 
orf 6a 

orf 6. pep 
orf 6a 

orf 6. pep 
orf 6a 



KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

I I I I I ! I I I I I I M I II I i I I I II I M I 
QIVEHAVLHTPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 

40 50 60 70 80 90 

40 50 60 70 80 90 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
M I II I M I M I I I I I I I I I I I I I I I I i I I I I I I I M I I II I I II I I I I I I I M I I I I I I 
AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 * 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
M I I I I I M I I I I I I I II I I I I I I I II I I I I I I I I I I M 1 I I I II I I I I I I 
NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 



The complete length ORP6a nucleotide sequence <SEQ ID 659> is: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGACCCGTC 
TTCGTTAAAT 
TCGAACACGC 
CGTGTGGTCG 
CGAAGACGCG 
CGCAAAAATT 
GAAGATCAAA 
CGCCAACTTT 
CCGTTTGGAC 
TACAATCCCT 
AAACTGGTTG 
CAGGTGAAAA 
GCATAA 



AATCTCTGCA 
AAAAATCTGC 
CGTTTTGCAC 
TGCTGTTTGG 
CTGCGTGCCG 
GAACCTGTTT 
ATGTCGTCAA 
CCCGTTTGGG 
GACACTTGCC 
TGCCCGATGC 
TTGCGCGCAC 
GACCTTTGAA 



ACAGGCTGCC 
CCGTCGGCAA 
ACACCTTCTT 
CGAAGAGCAT 
TCGTGCCTGC 
AAGGCGGGTG 
AGGTTTGCAG 
CGGACCAGGC 
GCGGTCGGCG 
GGCGATTGCC 
AAATGGTTAT 
CCAGTTGCAG 



GAAAGCCGCC 
AGATGAAATC 
CGTTCAATTC 
GATAAGGTGT 
CGACAGTTTT 
CGGCAACTAT 
GAGCAGTTCC 
GAACGCGATG 
TAGGTGCAAA 
AAAGCGTGGA 
CGGCGGTATT 
AACGTTTGAA 



GTTCCATTTA 
GTCCAAATCG 
CCAATCTGCC 
GGCAATTTGT 
GAACCGACCG 
TTTGTTTTAT 
CTGCTTATGC 
GTGCAGTATG 
CCTGCAACAT 
ATATCCCCGA 
GAAGGGGCGG 
AGTGTTCGGC 



35 



This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSI YSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

40 orf 6a .pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I 1 I I I'M I M I I I I M I I I I I I I I I M 
o r f 6 - 1 LRAW PADS FE PT AQKLN LFKAGAAT I LFY 

10 20 30 

45 110 120 130 140 150 160 

orf 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQH YNPLPDAAIA 
I I I I! I II I II I I I I I II I I I I I I I 1 M I I I I I I I I I I I M I I II I II II I I I I I I I I I I 
orf 6-1 E DQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVG AN LQH YNPLPDAAIA 

40 50 60 70 80 90 

50 

170 160 190 200 

orf 6a . pep KAWN I PENWLLRAQMVI GG I EGAAGEKT FE PVAERLKVFGAX 
I I I I I I i I I I I I I I M I I I I I I I I I I I I I I I I II I I II I I I I 
o r f 6 - 1 KAWN I PEN W LLRAQM V I GG I EG AAG EKT FE PVAEP.LKV FG AX 

55 100 110 120 130 

Homology with a predicted ORF from N. gonorrhoeae 

ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 
N. gonorrhoeae: 
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orf6 pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 

I II I I I I I I 1 I I I I I I I II I III I : I I I 
orf 6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 

orf 6 . pep AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 90 

I I I I I I I I I I II II I 11 II I I I I I I I I I I I M M I I I I I I I I I I II I I I ! II : I I I I M 1 
orf 6ng AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 124 

orf 6. pep N PL PDAAIAKAWN I PENWLLRAQMV I GG I EGAAGEKTFEPVAERLKVFGA 140 

I | I I I : I I II I I I It II I I I I I I I I I I I I I I I I II I : I I I I I I I II I I i I 
orf 6na NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 17 4 



The complete length ORF6ng nucleotide sequence <SEQ ID 661 > was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

4 01 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

4 51 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 

This encodes a protein having amino acid sequence <SEQ ID 662>: 



1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAWPA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

151 GGIEGAAGEK VFEPVAERLK VFGA* 



ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 

10 - 20 30 
LRAWPADSFEPTAQKLNLFKAGAATILFY 
I I I I I I I M II I I I I I I : I I I I I I I I I I I I 
PT VLRMGLPLY I ASLRRGAI YKVWQFVE DALRAWPADS FE PTAQKLKLFKAGAATI LFY 
20 30 40 50 60 70 



orf 6-1 . pep 
orf 6ng 



40 50 60 70 80 90 

orf 6-1 . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
M I I I M I I I I I I I M I I II II I I I I II I I 1 I I I I I I I I I I M : I I I I I I 11 I I I I : I I I 
orf 6ng EDQNWKGLQEQFPAYAAKFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

100 110 120 130 

o r f 6- 1 . pep KAWN I PENWLLRAQMVI GG I EGAAGEKT FE PVAERLKVFGAX 
I I I I I I I II I II I I M I I II I I I M I I : I I I I I I I II I I III 
orf 6ng KAWN I PENWLLRAQMVI GG I EG AAGEKVFE PVAERLKVFGAX 

140 150 160 170 



It is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGt. CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 
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251 
301 
351 
401 
451 
501 
551 
601 



CATTTCGGGC 
rCTGCGCgGC 
GGCGCGAACG 
ATCGCACCGC 
AGAAACCGCC 
ATGCCACCGC 
AGCCACCACC 
CCAAGACTGG 



TGGACGCGGA 
CGCCTGGTTT 
CAGCCGskAT 
AAACCCGCGT 
GACGCGCCGC 
CTTCGGCCCG 
GTGCGCTCAA 
AAACTCAAAG 



CGTATCGGGC 
CCAcCTTCGG 
GCCGAACTCT 
CCACGCArGC 
TCAGcTACGC 
AAAGACAACC 
CCTGTTCGCC 
CCGAATACGA 



AGCCTGAACA 
ACGCGGCGAC 
ACGGCATTTT 
ATGGACTACC 
CGTGTACGAC 
CCGCCACAAA 
GGCATCGAAC 
CTAC. . 



CCGAAG.crC 
TCGTGGCGGC 
GGAATACGAC 
AGCAGGCGAA 
AGCCAAGGTT 
TTGGGCGAAC 
ACCGCTTCAA 



This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

10 1 . . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

1 5 Further work revealed the complete nucleotide sequence <SEQ ID 665>: 



20 



25 



30 



35 



40 



45 



50 



55 



l 

51 
• 101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
125.1 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



ATGACACGCT 
CGCGCAGGCC 
CTGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCCGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CGAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTGCT 
GGTTATTGGC 
CGGCAAATAC 
ACGGTTACAA 
AACGCCATTC 
GCCTGCATCG 
TCGGCGGCTA 
ATTTTGGGCG 
CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGCCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATCGGCGC 
ACGCTCCGCA 
CCGCCAAAAA 
ATCCGCGCGC 
TACCGCACCC 
CGCGGCGTTT 



TCAAATATTC 
GATGTTTCTG 
GACCATCACC 
CTGTTTCCGG 
ATCCCGCAGA 
CAT C AAAACG 
AG AT T T AC G G 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCG 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATCGA 
GACTACACCC 
TTCCATCGAC 
ACGCCGACCC 
CGCCTGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GACGATACAC 
ATGACCTATG 
CGACCTGACC 
TCCCGCAATC 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACTGCCTACC 
AGGCGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



CCTGCTGTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAACTACCA 
GGCAATGCCA 
GGCGGGGCTG 
TGCGCAAACG 
GGCAACCGCA 
CACCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGGCGA 
ACACCGCTTC 
GCAGCCGCTT 
CACAACACCG 
GCGCACCCAC 
GCCGCGAACA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGTTACCGC 
TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGG 
GCCGCCGTGT 
CGACCCGAGC 
ACGGCTGGGA 
CAGGCAGGTT 
GAACCCCGAC 
ACTTTGCCCC 
TGGCAGAGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



GCCGCCCTGT 
CCCCAAACCG 
ACCGCACCGC 
CCGCTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CTGGACGGCA 
CCTGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCGGCGCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCCA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCG 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGTG 
ATACGGCACC 
CCGCCGACAA 
ACCGGCAGCT 
CCGTTTCACC 
CTCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTACCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAATCTGTT 
GGCGCACTGC 



TGCCCGTGTA 
CAGGAAAGCA 
GAGTTCCAAC 
TGCCCATGAC 
CAACAAATGC 
GGCGACCGGC 
ACTACCTGTT 
ATCCCCGTTG 
CTATGAGCGC 
CGGGCGAGCC 
AAGCCATTGT 
GCTGGACGCG 
GCCGCCTGGT 
CGCAGCCGCG 
GCAAACCCGC 
CCGACGCGCC 
GCCTTCGGCC 
CCGTGCGCTC 
GGAAACTCAA 
TACGGCGTAG 
CCTGATTCCC 
TGTCATTGAT 
GCGGGTATCA 
CATCATCCCC 
CCTACCCGCA 
AGGCGGCAAA 
CCTTTCGCTG 
ACGACAGCCG 
CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTACCGCGC 
GGCGGCCGCA 
CAAAACCCGC 
AACGCAGCTT 
AGCGGCTGGA 
CGACCCTGCC 
C C G AC AAC AG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



60 This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

■51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

65 201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-380- 



PCI7IB98/01665 



10 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AY A V AD I MAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLS I D 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



15 



20 



25 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (ac cession number P38047) 
ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 
++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 



Orf23 


6 


PupB 


215 


Orf23 


66 


PupB 


274 


Orf23 


126 


PupB 


334 


Orf23 


184 


PupB 


392 



R T + 



EAGN 



+G DVSG L 



+YGI E+D++ T + 



D+PL 



+RGR V+ + 



S G 



N A +W+ 



+ H 



+ F IE + 



W K E 



30 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 211aa overlap with an ORF (ORF23a) from strain A of N. 



meningitidis: 



35 



40 



45 



50 



55 



orf 23 .pep 
orf 23a 

orf 23 . pep 
orf23a 

orf 23 .pep 
orf23a 

orf 23 . pep 
orf23a 



10 20 30 

GYNYLFARGSRIANYQINGI PVADALADTG 

I I I I I I M I 1 M I M I I I I I I I I I I 

QMR DQN I KALDRALLQATGTSRQIYGS DRAG YNYLFARGSR I AN YQING I PVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 

I I I | M I I I I I I I I II I I I I I I I I I I I I I II I ! I I I I I I II I I I I I I I II M II M II 
NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 

111111 = 1 : I I I II I I I I I I II I I I : I I I I I I I I I M I I I II 11 I 1 II I I I I I I I I I 
VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 
210 220 230 240 250 260 

160 170 180 190 200 210 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 

I U I || I I II I I II I I I I II I I I I I I I I I I M I I : I I I I I I I I M I I I I I I I I M I I I I I 
ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 . 280 290 300 310 320 



60 



orf 23 .pep 
orf 23a 



I 

YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 
330 340 350 360 370 380 

The complete length ORF23a nucleotide sequence <SEQ ED 667> 



is: 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-381- 



PCT/IB98/01665 



1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAAT GC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

7 51 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

13 51 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 
1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

14 51 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 
1501 AGCCTGTTCG TCCCGCAATC G C AAAAAG AC GAACACGGCA GCTACCTGAA 
1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 
1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 
1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 
1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 
17 51 TCACGCCCGA ATGGCAGATA CAGGCAGGTT AC AG C C AAAG CAAAACCCGC 
1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 
1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 
1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 
1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 
2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 
2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 
2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 
2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 668>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKH FGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLS I D HNTAATDLIP 

351 GYVJHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQT I PQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* ■ 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 

10 20 30 40 50 60 

orf 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I | | | M I I I I I I I I I I II II I I I I I II I I I M I I I II M I I I II I II I 11 I I I I 1 I II I I 
orf 2 3-1 MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 



WO 99/24578 



PCT/IB98/01665 



-382- 



10 



15 



70 80 90 100 110 120 

orf 23a . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDE^AGYNYLFARG 
M I I I I I || I I II t I I M I I M I I I I I I I : I I I I I i I I I I I M I I I I I I I I I I I I I I I I I 
orf 2 3-1 PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 23a pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 
I I I | I I I M I I I I II I I 1 I I II I M I I I I I M I I I I I I 1 I I I I I I I I I I I I I M II I II 
orf 2 3-1 SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 23a . pep KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
I I | I I I I I II I I I I I I I I I I I I I I I I : I I I I II I I I I I II I I I I I I : I I I I I I I I I I I I 
orf 23-1 KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 



20 



250 260 270 280 290 300 

orf 23a . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

I I I M I I I ! I I I I I I II I I I I I I I I M I I I ! ! I ! I I I I I I I I I I II I M ! i I I I I I M M 
orf 2 3-1 LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 



25 



30 



35 



310 320 330 340 350 360 

orf 23a .pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
I I I I I I I I M I I M I I I I I I I I I I I II M I I I I I I M I I I I I I I I M I I I I I I I M I I I I 
orf 2 3-1 NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 23a. pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERS 1 1 PNAI PNAYEFSRTGAYPQPAS 

I I I M I I I I I II I I I I I I I I I I I I I II I I I I I I I I I M M I M 1 I I I I I I I I I M I I M I 
orf 2 3-1 SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERS 1 1 PNAI PNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 



40 



430 440 450 460 470 480 

orf 23a .pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGN5TYVSANRFT 
I I I I I I I I I I I I II I 1 I I I I I I I II I II I I I I I I I I : I M I I I I I I I M II I I I I I I I I I 
orf 23-1 FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 



45 



490 500 510 520 530 540 

orf 23a .pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I I I I I M I 1 I I I II I II I I II I I I M I I I I I I 1 I I I I II I I I I I M I I I I I I I I I I I I 
erf 23-1 PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 



50 



55 



60 



65 



550 560 570 580 590 600 

orf 23a . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I M I I I M I I I I M I I I I I I I I I I I I I I I M I M I I II I I I I I I I I I I I I I I I I I I I I 
orf 2 3-1 AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23a. pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I II I I I I I I I I I I I I I I I I I I I I I I M I II I I I I M M I I I I I M II I I I I I M I I I I 
orf 2 3-1 DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23a. pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
I I I I I I I I I M I I I I I I I I I I I I I I I I I I 1 I I II I M I I I I I I I I M II I I M I I I M I I 
orf 2 3-1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



70 



orf 23a. pep 
orf23-l 



TYRFKX 
Mill! 
TYRFKX 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 



-383- 



PCT/IB98/01665 



Homology with a predicted ORF from N. gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N, 



10 



15 



gonorrhoeae: 

orf 23 -pep 
orf 23ng 
orf 23 .pep 
orf23ng 
orf 23. pep 
orf 23ng 
orf 23. pep 
orf 23ng 



GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLD 

I I I I I I I I M I M I I I I I I I I I M I I I I I I I I I I I I I M I I I I M I I I I I 
SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 



51 



60 



GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 

I I I I I M I I M I I I : I I I I I I M I I I I I I 11 I I I i 11111111:1 : I I I I I I 1 I I I I 

GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 

Mill: MM I M M I M M I M I M I M M I I I I I I I I M 1 M I I I I I I I I I i I 1 I 

GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

GPKDN PATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 211 
I M M M 1 I I : II : : I II M M M M I I I I t I I M I I M I 

GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 24 0 



The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 



20 amino acid sequence <SEQ ID 670>: 



25 



30 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



SAVDACRIPG 
WRGVAGLPD 
SGSLNAEGTL 
AGMDYQQAKE 
FAGIEHRFNQ 
WHADPRTHSA 
IPNAYEFSRT 
GGRYSRYRAG 
FVPQLQKDEH 
TAAGRDQSGN 
DGSRLNPDSV 
RIPN PAAKAR 
TQPDRHSYGA 



YNYLFARGSR 
GTGEPSATVN 
RGRLVSTFGR 
TADAPLSYAV 
DWKLKAEYDY 
SMSLTGKYRL 
GAYPQPSSFA 
SYNSRTQGMT 
GSYLKPVTGN 
TYYRAANQAK 
PERSFKLFTA 
AVANSRQKAY 
LRTVNAAFTY 



IANYQINGIP 
LVRKHPTRKP 
GDSWRQLERS 
YDSQGYATAF 
TRSRFRQPYG 
FGREHDLIAG 
QTIPQYDTRR 
YVSANRFTPY 
NLEADIKGEW 
THGWEIEVGG 
YHLAPEAPSG 
AVADIMARYR 
RFK* 



VADALADTGN 
LFEVRAEAGN 
RDAELYGILE 
GPKDN PATNW 
VAGVLSIDHS 
INGYKYASNK 
QIGGYLATRF 
TGIVFDLTGN 
LEGRLNASAA 
RITPEWQIQA 
RTIGAGVRRQ 
FNPRTELSLN 



ANTAAYERVE 
RKHFGLGADV 
YDIAPQTRVH 
SNSRNRALNL 
TAATDLI PGY 
YGERSIIPNA 
RAADNLSLIL 
LSLYGSYSSL 
VYRARKNNLA 
GYSQSKPRDQ 
GETHTDPAAL 
VDNLFNKHYR 



Further work revealed the complete nucleotide sequence <SEQ ID 67 1>: 



35 



40 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGACACGCT 
CGCGCAGGCC 
CCGAATTGCC 
GACGGCTACA 
CCTGCGCGAA 
GCGACCAAAA 
ACCAGCCGCC 
CGCGCGCGGC 
CCGACGCGCT 
GTAGAAGTCG 
TTCTGCCACC 
TTGAAGTCCG 
GACGTATCGG 
TTCCACCTTC 
ATGCCGAACT 
GTCCACGCAG 
GCTCAGCTAC 
CAAAAGACAA 
AACCTGTTCG 
AGCCGAATAC 
CAGGCGTACT 
GGTTATTGGC 
CGGCAAATAC 
ACGGCTACAA 
AACGCCATTC 
GCCATCATCG 
TCGGCGGCTA 
ATACTCGGCG 



TCAAATACTC 
GATGTTTCTG 
GACCATCACC 
CCGTTTCCGG 
ATCCCGCAGA 
CATCAAAACG 
AGATTTACGG 
AGCCGCATCG 
GGCCGATACG 
TGCGCGGCGT 
GTCAATCTGG 
CGCCGAAGCC 
GCAGCCTGAA 
GGACGCGGCG 
CTACGGCATT 
GCATGGACTA 
GCCGTGTACG 
CCCCGCCACA 
CCGGCATAGA 
GACTACACCC 
TTCCATCGAC 
ACGCcgatcc 
CgcctGTTCG 
ATACGCCAGC 
CCAACGCCTA 
TTTGCCCAAA 
TCTCGCCACC 
GCAGATACAG 



CCTGCTTTTT 
TTTCAGACGA 
GTTACCGCCG 
CACGCACACC 
GCGTCAGCGT 
CTCGACCGCG 
CTCCGACCGC 
CCAACTACCA 
GGCAATGCCA 
GGCGGGGCTG 
TACGCAAACA 
GGCAACCGCA 
CGCCGAAGGC 
ACTCGTGGCG 
TTGGAATACG 
CCAGCAGGCG 
ACAGCCAAGG 
AATTGGTCGA 
ACACCGCTTC 
GTAGCCGCTT 
CACAGCACTG 
GCGCACCCAC 
GCCGCGAGCA 
AACAAATACG 
CGAATTTTCC 
CCATCCCGCA 
CGTTTCCGCG 
CCGCTACCGC 



GCCGCCCTGC 
CCCCAAACCG 
ACCGCACCGC 
CCGTTCGGGC 
CATCACATCG 
CCCTGTTGCA 
GCGGGCTACA 
AATCAACGGC 
ACACCGCCGC 
CCGGACGGCA 
CCCGACCCGC 
AACATTTCGG 
ACGCTGCGCG 
GCAGCTCGAA 
ACATCGCACC 
AAAGAAACCG 
TTATGCCACC 
ACAGCCGCAA 
AACCAAGACT 
CCGCCAGCCC 
CCGCCACCGA 
AGCGCCAGCA 
CGATTTAATC 
GCGAACGCAG 
CGCACGGGCG 
ATACGACACC 
CCGCCGACAA 
GCAGGCAGCT 



TACCCGTGTA 

CAGGAAAGCA 

GAGTTCCAAC 

TGCCCATGAC 

CAACAAATGC 

GGCGACCGGC 

ACTACCTGTT ' 

ATCCCCGTTG 

CT AT GAG C G C 

CGGGCGAGCC 

AAGCCATTGT 

GCTGGGCGCG 

GCCGCCTGGT 

CGCAGCCGCG 

GCAAACCCGC 

CAGACGCGCC 

GCCTTCGGCC 

CCGTGCGCTC 

GGAAACTCAA 

TACGGTGTGG 

CCTGATTCCC 

TGTCATTGAC 

GCGGGTATCA 

CATCATTCCC 

CCTATCCGCA 

AGGCGGCAAA 

CCTTTCGCTG 

ACAACAGCCG 



BNSDOC1D: <WO 9924578A2J_> 
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10 



15 



1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
C CG C C AG AAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MTRFKYSLLF AALLPVYAQA 



DGYTVSGTHT 
TSRQIYGSDR 
VEWRGVAGL 
DVSGSLNAEG 
VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYSRYR 
SLFVPQLQKD 
LATAAGRDOS 
DQDGSRLNPD 
ALRIPNPAAK 
YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAVANSRQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
I PVAD ALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDN PAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWSNSRNRAL 
HSTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKPR 
RQGETHTDPA 
LNVDNLFNKH 



ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 



35 



40 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 23-1 . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
I | I M I I I I I II I I I I I I I I I I M I M I I I II I 1 I M I I I I It 1 I I I M I I I I I I I I I I! 
orf2 3ng-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf 23-1 . peD PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I : M I I I 1 I I I I 1 I I I I I I I I I I I M I I I I I I I M I I I- 1 I II I I I I I I I I I M I I I II I I 
orf23ng-l PFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 23-1 . pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
I I 1 | I || I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I ! I I I I I I I M : II 
orf23ng-l SRIANYQING I PVADALADTGNANTAAYERVEWRGVAGLPDGTGE PS ATVNLVRKHPTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 3-1 .pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
M I I f I I I' I I I I I! I I I I I I I II I I I : I I I I I I I I I I M I I I I I I I : M I II I I I I I I 
orf23ng-l KPL FE VRAEAGNRKH FGLGADVSGS LNAEGTLRGRLVST FGRG DSWRQLERSRDAEL YG I 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 2 3-1 - pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
I M | I I I I I I M I I I I I I I I I I I I 11 I I I t I I M I I I I II I I I i II I I I I I I : I I I : I I I 
orf2 3ng-l LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 2 3-1 ."pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
I l I I I I I I 11 II lit 1 I I I I I I I I I I I I I I I I M I I I I I M : N I II I I I I II I I I I I I I 
orf23ng-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
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310 



320 



330 



340 



350 



360 



370 380 390 400 410 420 

or^23-l pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
1 * 111:11 I I M I I I II I I I ! I M I I I M I I I I I M I I I II I I I I I I I I I I I 1 I I I I M : I 

orf23ng-l SASMSLTGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPSS 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf23-l pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
MINIM I { I I II I I I I I I! I I I 1 I I ! I I I M 1 I : I I I : I I I : I I I I I I I I II I I I I I 
orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 23-1 pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

I i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ll l I I l 1 1 1 1 1 1 1 1 1 l I i l I 1 1 l I I I I 1 1 1 l M M I 

orf23ng-l PYTG I VFDLTGNLS L YGS YS S LFVPQLQKDEHGS YLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 23-1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I | M I I I | I I I I I II II I I I I II I I I I I I I I I I I I I i I I II I I I I I I I I I I I I I I I I I 
orf2 3ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 23-1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I M I I I I I I I I I I II I I I I II I I I : I I 1 I I I I I M I I I I I.: I I M M I : I I I I I M I I 
orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 23-1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
Ml: I 1 M M M i M M M I M M I : I II II M M I I I I I I I I I II I II M II I M I I 1 
orf 23ng-l T^RAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 

orf 23-1. oep TYRFKX 
I I I I I I 

orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 



spl P16869 | FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE (III) -COPROGEN , FE(III),- 
FERRIOXAMINE B AND FE ( III ) -RHODOTRULIC ACID PRECURSOR >gi I 1 651542 f gnl I PID I dl0l5403 
(D90745) Outer membrane protein ' FhuE precursor [Escherichia colij 
>gi 1 1651545 I gnl I PI DldlO 15405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi I 178734 4 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III ) -f errioxamine B and Fe ( III ) -rhodotrulic acid precursor 
[Escherichia coli) Length = 729 
Score « 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps - 60/717 (8%) 

Query: 38 TITVTADRTASSN--DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

Sb j ct : 4 3 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

Query: 96 LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
Sbjct: 103 ENTLGISKSQADSDRALY YSRGFQI DN YMVDG I PTYFESRWNLGDALS DM AL 154 

Query: 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 

+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

Sbjct: 155 FERVEWRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query: 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 274 



Query: 2 67 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 
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+++ G + ++ + A +W+ + +F ++ +F W+ ++ 



10 



15 



Sbj ct : 


275 


Query : 


327 


Sbj ct : 


335 


Query : 


375 


Sbj ct : 


395 


Query : 


433 


Sbjct : 


452 


Que ry : 


492 


Sbj ct : 


505 


Query: 


552 


Sbjct: 


565 


Query: 


609 


Sbj ct : 


625 


Query : 


669 


Sbjct : 


673 



A V D ++ PG+ W++ R A + G Y LFG 



R+H+L+ G Y +N+Y +1 P+ I + Y F+ G + PQ Q++ Q DT 



Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

LYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTT PYAGLVFDIND 504 



F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 



20 Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 



25 P ++P + K+FT+Y LP P T+G GV Q +TD P RA 

P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

QKAVAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 
Q +YA+ D+ RY + L NV+NLF+K Y T + YG R + TY+F 

30 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST-fusion in Exoli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

40 Example 80 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

2 01 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG. . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG.. 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



ATGCGCACGG 
GGCAATGATG 
T CAT AT C CAA 
AGCGTCAGCA 
AACGGGGATA 
TGCCGCCTTT 
CCGTGCGTAC 
TGAGTCGCCG 
ACGGGATATT 
CGGGTAATTT 
TGTCGTTGCA 
ATACGCCGAC 
CCCGCCATAA 
AGCGCAGCCG 
CGCCCGCCAG 
ATATTGATGG 
GGAGCGGATT 
CGGAAAAACC 
AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
GCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCAGT 
TCTGAATTTT 
ATTGATAACG 
ACGGGTTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCCCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGCGCCAG 
GAGGCTTCGC 
TTTCTTCACT 
CCAACGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GCGTGATTTC 
T C CAT AT T G A 
AATATCGGTA 
CCGAAGGCGA 
G AC AC AC CG A 
GTAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCCGCAA 
TTTTACGACA 
CGCCCGAACC 
TTGCAGAACA 
CGCCGTGCGT 
TACCGGCACG 
GTCTTCATCG 
CATCCCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGTGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCTGGGCCGG 
AT G AAACGCG 
CGACAATTTT 
TTGACGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAACG 
AGCTTTATCC 



This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 



25 



1 MRTAWLLLI MPMAASSA MM 

51 SVSTPASAAA IIPSSSETGI 

101 PCVPQTLKPI SSRMRATESP 

151 RVILKAVFFT TSATSVNWA 

201 PAINGLSSTA LQNTTILAQP 

251 ILMELHTISV VFIA SGMERI 

301 KVCATLT* 



PEMVCAGVSP 
NAPLKPPTAL 
TAGVGAS DKS 
SEFSNAAFTT 
KPSGVISAVR 



GTAIISKPTE 
EAIMPPFFTA 
RIPNGIFSIF 
PGPDTPTLIT 
LTVSPASLTA 



QTAVMASSLS 
SFSNAKAAW 
EASRPMSSPT 
ASASPEP*NA 
SILIPARVLP 



NTSSEGDIPF CTNAEKPPIK DTPMALAALS 



30 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitid is (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A oiK 
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60 



meningitidis: 



orf 24a . pep 
orf24 



orf 24a. pep 
orf24 



orf24a.pep 



orf24 



orf24a .pep 
orf24 



orf 24a .pep 
orf2.4 



10 20 30 40 50 60 

MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAI ISXPTEQTAVIASSLSNVSTPASAAA 

I I I I I I II I 1 I ! I i I 1 ! I I I I M I M I I I I I M I I I I I I I M I s I I I I I : I I II 

MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

I I PS SSXTGINAPLKPPTALEAIMPPFFT AS FSNAKAAVVPCVPQTLKPI SSRMRATESP 

I I I I M I I II I I I I I I M I I! I I 1 I I I M II M II I I I I I II I I I I I I I I M II M I I I 
IIPSSSETGINAPLKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKPI SSRMRATESP 
70 80 90 100 110 120 

130 140 150 160 170 180 

TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

| I | | | | | | | I I I I I I I I I I I I I I I I I II I I I I 1 I I I I M I I I I I I I I I I I I I I I I I M I I 
TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

PGPDT PTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVI SXVRLMVSPASLTA 
I III M I I I I i I I I M I I M I I I I I I I : M I I M I I I I I I I I : M I III I II II I I I 
PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

MM M II I M I I I M M II I I II M M I I M I I M : I I II I I II I I I 

SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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orf2 4a.pep KVCATLTX 
I I I I I I I I 
orf24 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 



1 


ATGCGCACGG 


51 


GGCAATGATG 


101 


TCATATCCAA 


151 


AACGTCAGCA 


201 


NACGGGGATA 


251 


TGCCGCCCTT 


301 


CCGTGCGTAC 


351 


CGAGTCGCCG 


401 


ACGGGATATT 


451 


CGGGTAATTT 


501 


TGTCGTTGCA 


551 


ATACGCCGAC 


601 


CCCGCCATAN 


651 


GGCGCAGCCG 


701 


CGCCCGCCAG 


751 


ATATTGATGG 


801 


GGAACGGATN 


851 


CGGAAAAGCC 


901 


AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
NCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCAGGGG 
CAGCATTTTT 
TGAAGGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGGTTGTC 
AAACCTTCTA 
TCTGACCGCG 
AG CTGCAC AC 
AACACCTCGT 
GCCAATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAACCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACA 
CCAACGCGGC 
GCATCCGCTT 
TTCCNCCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCAGTA 
CAGAAGGCGA 
GACACGCCGA 
GTAA 



ATGCCGATGG 
TGTGTCGCCG 
TCATCGCTTC 
AT CAT AC C TT 
AACCGCGCTC 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCGGCAA 
TTTTACGACA 
CGCCTGAGCC 
TTGCAGAACA 
ANCCGTGCGT 
TACCGGCGCG 
GTCTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTATCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCCGGGCCGG 
GTGAAACGCG 
CGACGATTTT 
TTGATGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAGCG 
AGCCTTATCC 



This encodes a protein having amino acid sequence <SEQ ID 67 8>: 



i 

51 
101 
151 
201 
251 
301 



MRTAWLLLI 
NVSTPASAAA 
PCVPQTLKPI 
RVILKAVFFT 
PAIXGLSSXA 
ILMELHTISV 
KVCATLT* 



MPMAASSAMM 
IIPSSSXTGI 
SSRMRATESP 
TSATSVNWA 
LQNTTILAQP 
VFIASGMERX 



PEMVCAGVSP 
NAPLKPPTAL 
TAGVGASDKS 
SEFSNAAFTT 
KPSSVISXVR 
NTSSEGDIPF 



GTAIISXPTE 
EAIMPPFFTA 
RIPNGIFSIF 
PGPDTPTLIT 
LMVSPASLTA 
CTSAEKPPIK 



QTAVIASSLS 
SFSNAKAAW 
EASRPMSSPT 
ASASPEP*NA 
SILIPARVLP 
DTPMALAALS 



It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf24a .peD MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I I I I I I i I 1 II I I I I 1 I I I I 1 I i I 1 I 1 I I I I I I I I I I I I I I I I : M I I I : I II I I I I I I 
orf24-l MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf2 4a.pep 1 1 PSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAW PCVPQTLKPI SSRMRATESP 

II I II I I I I I I I M I II I I I I I I I I I 11 I I I I I t I I I I I I I M I I I 1 I I I I I I M I II I 
orf24-l 1 1 PS SSETG IN APLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf24a.pep TAGVGAS DKSRI PNGI FS IFEASRPMSS PTRVI LKAVFFTTS ATSVNWASEFSNAAFTT 

i I I I 1 M I I I I I I M I I I I I I i I I II I I I I I I I M 1 I II I I I M I I I I I i I I I I I I M I I 
orf2 4-l TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf2 4.a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
I I I I I I I I I II I 1 II I II I 1 I I I I I I I : I I I I I I I M M I I I : I I I II I 11 I I I I I I 
orf 24-1 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf2 4a.pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I I I I I t I 1 I I I I 1 I I I t I 1 I I 1 I I I I I I 1 I I I I I II I I 1 I I : I II I I I I I I I I I I I I I I 
orf24-l SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 
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or f 2 4 a . peo KVCATLTX 
II I I M I I 

5 orf24-l KVCATLTX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 

N. gonorrhoeae: 

10 orf24 pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 60 

I I I M II I I I I I I I M M I I I I M I I I I I I I M I : I I I I I I II I I : I I I I I I I 

orf2 4ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 60 

orf24 pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 
15 " * | | | | | i I I II I I I M M I I II I M I I I I II M I I I M I I I I I M II I I I I I I I I I I I I 

orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 120 

orf24.pep TG 122 
20 orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 

The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 

1 ATGCGCACGG CGGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCGATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATGTCCAA ACCAACGGAG CAGACGGCGG TCATGGCTTC GAGTTTGTCC 

25 151 AGCGTCAACA CGCCTGCCTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCGC TCAAACCGCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 

3 51 CGAGTCGCCG ACGGCGGGGG TCGGTGCCAG CGACAAATCG AGAATGCCGA - 
30 4 01 ACGGGATATT CAGCATTTTT GAGGCTTCGC GACCGATGAG TTCGCCCACG 

4 51 CGGGTGATTT TGAAAGCGGT TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 
501 GCTGACCGCG TCCGAATTTT CCAGCGCGGC TTTGACCACG CCTGGACCGG 
551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCCGAGCC GTGGAACGCA 
601 CCCGCCATAA ACGGATTGTC TTCCACCGCG TTGCAGAACA CGACGATTTT 

35 651 GGCGCAGCCG AAACCTTCGG GTGTGATTTC AGCCGTGCGT TTGATGGTTT 

7 01 CGCCTGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTGCTGCCG 

7 51 AT ATT GAT GG AGCTGCACAC GATATCGGTA GTTTTCATCG CTTCGGGAAC 

801 GGAACGGATC AACACCTCAT CCGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCGATAAAG GACACGCCGA TGGCTTTGGC TGCCTTGTCC 

40 901 AAAGTCTGCG CCACGCTGAC ATAA 

This encodes a protein having amino acid sequence <SEQ ID 680>: 

1 MRTAWLLLI MPMAASSA MM PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

45 151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTL1T ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPAS LTA SILI PAR VLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

. 301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

50 10 20 30 40 50 60 

orf 24-1. pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
I I I || M I I I II I I I I I I I I I I I I I I II I II I II : I I I I II I II 11 I I I I I I : I I M I I I 
orf24nq MRT AVVLLLIMPMAASSAMMPEMVCAGVS PGTAIMSKPTEQTAVMAS S LS SVNT PAS AAA 

10 20 30 40 50 60 

55 

70 80 90 100 110 120 

orf 24-1 . pep IIPSSSETGINAPLKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKP I SSRMRATESP 
- | M | | I I I I I I I I I I I I I I II I II I I I I I M I I I I I I I I II I I M I I I I I I I I I I I I I I 1 
orf24nq 1 1 PS SSETG I NAP LKPPTALEAIMPPFFTAS FSNAKAAWPCVPQTLKP I SSRMRATESP 

60 70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 24-1 .pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
I ! M I I I I 1 It : I I I I I i I I I I I i I I I I I I I I II I I M t I I I I I 1 I : : II M I : I I : I I 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 
5 " 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1 . pep PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
I I II I I I I II I I I I I I I I I I I ! I I I I I I i I I I I I I I I ! I I I I M I I I I I I I I I I I I I I 
10 orf24ng PGPDTPTLITASAS PE PWNAPAINGLS STALQNTTI LAQPKPSGVI SAVRLMVS PASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1 .pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
1 5 ~ I I I I I I I I I M I I I I I I I M I I II II I I I! I I I I I II I I I I : I I ! I I I I I I I I I I t ! I I 

orf24ng SILIPARVLPILMELHTISWFIASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

20 . orf 24-1. pep KVCATLTX 

I I I I II I I 

orf24ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 1 8 aa - double- 
25 underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 1>: 

30 1 . . ACCGACGTGC AAAAAG AG T T GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAAT C AG C AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

35 1 . . TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 

51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

40 101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

45 351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

4 01 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

4 51 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG AT G AT AG AC G 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

50 601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

7 01 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

7 51 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

55 8 51 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ED 684; ORF25-l>: 

l MY RKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGIAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

5 151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
10 Homology with a predicted ORF from N. men ingitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A ofN. 
meningitidis: 



orf25.pep 



10 20 30 

TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 
15 -r-r- I M I I I I I I I I I I ! M I I I M I I I I I I M 

orf2 5a VTVSRGEVEEARVQNQRAESE I TKLWGGLDTDVQKELVGEXRKWAQEKI SNCRQAAAQAD 

250 260 2*70 280 290 300 



40 50 60 

20 orf 25 .pep RQE YAEYLKLQCDTRMTRERIQYLRGY S I DX 

I I I I \ \\ 1 I I I I I I I I I M I I I I I I I I I I M 
orf 2 5a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ED 685> is: 

25 1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

30 251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

4 01 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

4 51 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

35 501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG AT GAT AG AC G 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GG ATTNT GAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

7 01 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

40 7 51 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

45 1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQX I RXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

50 151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

55 - 10 20 30 40 50 60 

orf 2 5a pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

" | | | I 1 I I I I I I I I I I I If I I I I 1 I I I II 1 I I I I 1 I II M I I I II I I I II i M I I II 

orf 2 5-1 MYRKL I ALPFALLLAACGRE EPPKALECAN PAVLQGIRGN I QETLTQEARS FARE DGRQF 



BNSDOCID: <WO 992457BA2 1 



WO 99/24578 



-392- 



PCT/IB98/01665 



10 



20 



30 



40 



50 



60 



10 



15 



20 



25 



30 



70 80 90 100 110 120 

orf25a.pep VDADXIIAAAXXXXXSLEHASETQEGGRT FCXADLNITVPSETLADAKANSPLLYGETAL 

* * I I I I Mill I II II II I M I I I II I I I I I I I I I I I I I II I I 1 I I ! I I 1 I I I I I 

orf 25-1 VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf25a pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
' ' I I II I I II I I M 11 I I I 1 I I II I II M I I I I I : I I I II II I I M I M I I I I I I I I I I I I I 

orf25-l SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf25a pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 
I I | | I | I | || I | I I I I III I I I I : I II I I I I I I I I I I I I I I I : I MINIMI 
orf2 5-l MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 

I | | I ] M M II I I I I i I I I I I I I I I I I I II I M I M I I M I I I I I I I I i I I I II I I I II 

orf 25-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

orf 25a .pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
I II M I II I I I I I I I I I I I I I II I 11 I II I I I I I I M II 
orf 25-1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 



35 



Homology with a predicted ORF from N. gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N. gonorrhoeae: 



40 



orf 25 . pep 
orf 25ng 
orf25.pep 
orf 25ng 



TDVQKELVGEQRKWAQEK I SNCRQAAAQAD 30 
II I II 1 I I II II I I I I I I t I I I I I M I II I 
VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKI SNCRQAAAQAD 308 

RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 
I 1 I I I I II I I II I I I I I I II M I II M I II 
RQE YAEYLKLQCDTRMTRERI QYLRGYSI D 33 8 



The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATGTATCGGA 
CGGCAGGGAA 
TGCAGGACAT 
TCTTTCGCGC 
CGCCGCCGCC 
AGGAAGGCGG 
TCTGAAACGC 
AACGTCTTTG 
TTAAAGACGG 
GCTCGGACGG 
GTCTGCCGCG 
GCAAGGCGGT 
CGTGAAGAAG 
TGCCGCCGGC 
AACCCGAAAT 
GTATCACGGG 
ATCCGAAATT 
AGTTGGTCGG 
cgACAAGCCG 
GCTCCAATGC 
GCTATTCCAT 



AACTCATTGC 
GAACCGCCCA 
ACGCGGCAGT 
GCGAAGACGG 
TACGGTTTGG 
GCGCACGTTC 
TTGCCGATGC 
GCAGACATCG 
CGTATTGACG 
CATTTATCGA 
TTGCTGCCTT 
GACAAAAGAA 
AACCGTCCAA 
GGCGATGCGG 
CCTGCATCCC 
GCGAAGTGGA 
ACCAAACTTT 
CGAACAGCGC 
CCGCGCAGGC 
GACACGCGGA 
CGATTAG 



GCTGCCGTTT 
AGGCGTTGGA 
ATT C AGG AAA 
CAGGCAGTTT 
CGTTTTCTTT 
TGTATCGCCG 
CGAGGCAAAC 
TGCAGCAGAA 
GCAGCCGTCC 
CAACACGGTC 
ACGGCGTGAA 
GACGCGGTCA 
ACCCACCCCC 
GCGTACCCCA 
GACGACGTCG 
AGAGGCGCGC 
GGGGAGGACT 
AAGTGGGCGC 
AGACCGGCAG 
TGACGCGCGA 



GCCCTGCTGC 
ATGCGCCAAC 
CGCTCACGCA 
GTCGATGCCG 
GGAACACGCT 
ATTTGAACAT 
AGCCCCCTGC 
GACGGGCGGC 
GCTTCCTGCC 
GGTATGGCGA 
GAGCATCGTG 
GGGTTTTGAG 
GAAGACATTT 
AGCCGCAGAA 
AGCGTGCCGA 
GTACAAAACC 
CGATACCGAC 
AGGAAAAAAT 
GAATACGCCG 
ACggaTACAG 



TTGCAGCGTG 
CCCGCCGTGT 
GGAAGCGCGT 
ACAAAATTAT 
TCGGAAACGC 
TACCGTGCCG 
TGTATGGGGA 
AATGTCGAGT 
CGCCAAAGAC 
CGCAAACGCT 
ATGATAGACG 
CGGCAAAGCC 
TGGAACACAA 
GGCGCACCCG 
TACCGTTACC 
AACGTGCGGA 
GTGCAAAAAG 
CAGcaactgc 
AATACCTCAA 
TATCTTCGCG 



65 This encodes a protein having amino acid sequence <SEQ ID 688>: 



BNSDOCID: <WO 9924578A2_L> 



WO 99/24578 
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1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKI I AAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

5 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 ^ 60 

10 orf25-l Deo MYRKLIALPFALLLAACGREEPPKALECAN PAV LQG I RGN IQETLTQEAR SFAREDGRQF 

I | | | | | l | | | | | I | | I I I I II I I I I I I ! I M I I I I II I : I M I I I M M I I I I I I I I I I 
orf25na MYRKLIALPFALLLAACGREEPPKALECAN PAVLQDIRGS IQETLTQEAR SFAREDGRQF 

y 10 20 30 40 50 60 

i 5 70 80 90 100 110 120 

or f 25-1 pep VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
I I M I II I I I I i M I I I I I I M I I M I M M I M II i M II I I I I I I : I I I 1 I I I I M : I 
orf25nq VDADKI I AAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 

70 80 90 100 110 120 

20 

130 140 150 160 170 180 

orf25-l pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
: | M : I I I M M i I I I I I I I I I M I I I : I I : : I I I : I I M t M : M I 1 II I I I I I I I I I I 
orf25nq ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 
25 130 140 150 160 170 180 

190 200 210 220 230 240 

o-f25-l Peo MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
I I I I I I 1 II I i 1 I : I I I I 1 I I M I I t I II I I I I M I I I I M I I I M i I I I I I I I I I I I I 
^0 o^f2 5na MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

9 190 200 210 220 230 240 

250 260 270 280 290 300 

or f 25-1 pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

35 ' * J i I I I I I I M M I I I I I I I I I II I M I I I 1 I I I I I M I M 1 I I I M I I I I 

orf25nq DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

40 orf 25-1 . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

I I I | 1 I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I II I I II 
orf2 5ng RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
55 that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1 . 



Example 82 

The following partial DNA sequence was identified in N, meningitidis <SEQ ID 689> 



10 



15 



20 



i 

51 
101 
151 
201 
251 

851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGsyGATTGG 
TGGGTATTTT 



TCGACTATTC 
GCACTTGCCG 
TCTGGwysGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
// 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 



TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TTTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAA . . 



CTTGCGGCGT 
GCCGACTATC 
AATCGCCATT 
TGCACACCGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
AGGCATTGTA 



CTTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGATTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTACGCC 
TGGGTCTGAC 
TTGGCGGTGC 



AC 

GTTCTCTGCA 
TTGGCAGGGT 
CTTGGCTCAT 
TCCACACTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
TTAACCGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CkGATACTTT 
T 

TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCTA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



25 This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 



1 MQLIDYSHSF 
51 DGLTHLKDMV 



FSWPPFLAL ALAVITRRVL 
VGLAWSDXDW SLGKPKILVF 

// 



LSLGIGILXX VAFLVGGNPV 
XILLGIFTSL LTYSGSN . . . 



30 



251 

301 FGGTCGVFAV 

351 VGEMHTGDYL 

4 01 IAAAMAVKVE 

4 51 DHVTSQLPYA 

501 KK.. 



VLCTLGTIKT ADYPKAVWQG 
STLVAGNIHP GFLPVILFLL 
PALIIPCMSA VMAGAVCGDH 
LTVAAAAASG YLALGLTKSA 



TSLV 

AKSMFGAIAI LILAWLISTV 
ASVMAFATGT SWGTFGIMLP 
CSPISDTTIL SSTGARCNHI 
LLGFGTTGIV LAVLIFLLKD 



35 Further work revealed the complete nucleotide sequence <SEQ ID 69 1>: 



40 



45 



50 



55 



60 



1 


ATGCAGCTGA 


51 


TTTGGCACTG 


101 


GCATCGGTAT 


151 


GACGGTCTGA 


201 


CGGCGATTGG 


251 


TGGGTATTTT 


301 


GCCGACTGGG 


351 


GACCGCCTGC 


401 


TCGCCGTCGG 


451 


CGCACCAAAC 


501 


GCTGATGCCC 


551 


GACTGCTCGT 


601 


GTCGCCATGA 


651 


GTTCGTCGTC 


701 


AACAAGCCGC 


751 


ACCAAAGGTC 


801 


CTCAACGGTT 


851 


TCAGCATTTT 


901 


TTCGGCGGCA 


951 


GATTAAAACC 


1001 


TGTTCGGCGC 


1051 


GTCGGCGAAA 


1101 


CATCCATCCC 


1151 


TGGCGTTTGC 


1201 


ATTGCCGCCG 


1251 


TATGTCCGCA 


1301 


TTTCCGACAC 



TCGACTATTC 
GCACTTGCCG 
TCTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGATTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTTT 
GTTGAACGAA 
GTGTTTACGC 
TCCGCCATGA 
GGGGGCATTT 
CTTGCGGCGT 
GCCGACTATC 
AATCGCCATT 
TGCACACCGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATCACCGAAT 
CTATTACGCA 
CCTTCGACAT 
GCCCACGATG 
ACTGATTATT 
TCTACACCGG 
GAAAACACGG 
CCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGATTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGCGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCGCTC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGG CTCGATG 
AAACTGCCGT 
CCCGTTTTGG 
CGCGCAGGCA 
ACGTAAACAC 
GTTCTCTGCA 
TTGGCAGGGT 
CTTGGCTCAT 
TCCACACTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGTC 
TAAAGTTTCC 
CTATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
GCACGTTTCG 
TTCAGACGCT 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCTA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 



BNSDOCID: <WO 9924578A2J_> 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAAT CCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKS MFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

4 01 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein HI1586 of Rinfluemae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

Orf26 1 
HI1586 14 



10 



15 



20 



25 



30 



35 



40 



45 



MQLIDYSHSFFSWPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 
M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 7 3 



Orf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

KI1586 7 4 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

Orf 2 6 8 6 IFTSLLTYSGS— NTSLVFGGTCGVFAWLCTL — GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HII566 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 

Orf26 142 XXXXXXX ST VVGEMHTG DYLST LVAGN I HPG FLPVI LFLLAS VMAFATGTSWGT FG IMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNI PMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

Orf26 202 IAAAMAVKVE PAL 1 1 PCMS AVMAGAVCGDHCS PI SDTTILSSTGARCNHI DHVTSQXXXX 261 

IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 

HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 AT VAT AT S I G Y I WG FT Y S G LAG FAAT AV S L I V 1 1 FAVKKR 519 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of N. 



meningitidis: 



50 



55 



60 



orf 26 . pep 
orf26a 

orf 26. pep 
orf 26a 



10 20 30 40 50 60 

MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 

I I I I M II II I I II I I I I M I I I I I I I II I I I I I I I I I I M I I M I I I II I I ! I I II I 
MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 



10 



20 



30 



40 



60 



70 80 90 99 

VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

I I I II 1 I MINIM Ml II I I I II I I M M I I I 

VGLAWSDGDWSLGKP KXLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKKR RGAKMLTAC 
70 80 90 100 110 120 



BNSDOClD: <WO 9924578A2_I_> 
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orf26.pep 
orf26a 



LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 

180 



130 



140 



150 



160 



170 



orf 26 .pep 
orf 26a 

orf 26 .pep 
orf 26a 

orf 26 .pep 
orf26a 

orf 2 6 . pep 
orf26a 

orf 26 .pep 
orf 26a 

orf 26 . pep 
orf 26a 



TLAGLLVTYKITEYTPMGTFVAMSLMNYY ALFALIMVFWAWFSFDI GSMARFEQAALNE 

230 



190 



200 



210 



220 



240 



100 110 

TSLV 

MM 

AHDETAVSDGSWGRVY ALIIPVLALIASTVSAMI YTGAQASETFSILGAFENTDVNTSLV 
250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKS MFGAIAILILAWLISTW GEMHTGDYL 
M M M I : M M I M M M ! M M I I I M i M M M M M I M M M M M M M M M 
FGGTCGVLAWLCTL GTIKIADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
310 320 330 340 350 360 

180 190 200 210 220 230 

S T LVAGN I H P GFLPVILFLLASVMAFA TGT S W GTFGIMLPIAAAMAVKV E P ALIIPCMSA 
M M I M M M M I M I I I I i I I M I M I I M I I I I M I I I ! i II M I : I : M M M M 
ST LVAGN I HP G FLX V I L FLLAS VMAFA TGT SW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 

3 90 



370 



380 



400 



410 



420 



240 
VMAGAVf 



250 260 270 280 290 

:gdhcspisdttilsstgarcnhidhvtsqlpy altvaaaaasgylalgl tksa 

M M M M M M M M M M M M i i I M M M M M M M M M M M M M M M M I 

vmagavcg dhcspisdttilsstgarcnhidhvtsqlpy altvaaaaasgylalgl tksa 

430 440 450 460 470 480 

300 310 

llgfgttgivlavlifl lkdkk 

M M I : I I M M M M M I M I 

llgfgxtgivlavlifl lkdkkranax 

490 500 



The complete length ORF26a nucleotide sequence <SEQ ED 693> is: 



1 


ATGCAGCTGA 


51 


TTTGGCACTG 


101 


GC AT CGGT AT 


151 


GACGGTCTGA 


201 


CGGCGATTGG 


251 


TGGGTATTTT 


301 


GCCGACTGGG 


351 


GACCGCCTGC 


401 


TCGCCGTCGG 


451 


CGCGCCAAAC 


501 


GCTGATGCCC 


551 


GACTGCTCGT 


601 


GTCGCCATGA 


651 


GTTCGTCGTC 


701 


AACAAGCCGC 


751 


AGCTGGGGCA 


. 801 


CTCAACGGTT 


851 


TCAGCATTTT 


901 


TTCGGCGGCA 


951 


GATTAAAATC 


1001 


TGTTCGGCGC 


1051 


GTCGGCGAAA 


1101 


CATCCATCCC 


1151 


TGGCGTTTGC 


1201 


ATTGCCGCCG 


1251 


TATGTCCGCC 


1301 


TTTCCGACAC 



TCGACTATTC 
GCACTTGCCG 
TCTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
TACTTCCCTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGNTTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GGGTTTACGC 
TCCGCCATGA 
GGGTGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACAGG 
GGCTTCCTGN 
CACAGGCACA 
CCATGGCGGT 
GTGATGGCGG 
GACCATCCTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAANT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATCACCGAAT 
CTATTACGCA 
CCTTCGACAT 
GCCCACGATG 
ATTGATTATT 
TCTACACCGG 
GAAAATACGG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTTG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAT 
GGGCGGTATG 
TCGTCCACCG 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGCGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCGCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGATG 
AAACTGCCGT 
CCCGTTTTGG 
TGCACAGGCA 
ACGTGAACAC 
GTCCTCTGCA 
TTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
TTTCCTGCTC 
CGTTCGGCAT 
CCCTCACTGA 
CGGCGACCAC 
GCGCGCGCTG 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGTCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGTC 
TAAAGTTTCC 
CTATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
GCACGTTTCG 
TTCAGACGGC 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
CATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
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10 



15 



1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

14 51 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

5 This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 M QLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 nr,T.THT.KHMV VftLAWSDGDW SLGKP KXLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN RRGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADY PKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORP26a and ORF26-1 show 97.8% identity in 506 aa overlap: 

10 20 30 40 50 60 

orf26a uev mqlidySHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
" * I I I M I I I I I II I I I I M I M I I I I I I I I M I I I I 1 M H M I I II I II I I I I M I I I I I 

or^26-l mqlIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf26a peo VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
MINIUM II I I I I I I I I I M I I I I I M i I M M II M I i I I I I I I I I I I I I I I M I 
orf2 6-l VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf26a pep LVFVTFIDDYFKSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLM PVSSWGASIIA 

I I II I I i I I I I I I I I I I I I I I I I I I I I I I 1 : I M ! I I I I I I I I I I M M I I I I I I M I I 
orf26-i LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf26a r>ep TLAGLLVTYK I TEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFD I GSMARFEQAALNE 

M | | | | I I | I I i M I I 1 I I I I I I I I I I II I I I I I I I i I I M I I M I I I I I I II I I I M I I 
or f 26-1 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 2 6a pep AHDETAVS DGSWGRVYALI I PVLALI ASTVSAMI YTGAQASET FS I LGAFENT DVNTS LV 

I I I I I I I I I ::' M I II I I 1 I I I I I I I I I II I I I I I I I I I I M I M I M I I I 11 I I II 1 I 
o^-f2 6-l AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 -290 300 

310 320 330 340 350 360 

orf26a pep FGGTCG VLA WLCT LGT I K I AD YPKAVWQGAKSM FGAI AI L I LAW LIS T WGEMHTG D YL 

I M I I I I I I 1 I I I I I I I I I I I I M I I I M I I I M I II I M I M I I I I I II I M II I I I I 
orf2 6-l FGGTCG VLA WLCT LGT I KT AD Y PKAVWQGAKSMFGAI AI LI LAWL I S T WGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf26a.pep STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 
I I I I I I I I I I I I I I I I M II M I M M I II I M I I I I I I I M I II I I I : I : I I I I I I I I 
orf26-l STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf26a.pep VMAGAVCG DHCS P IS DTT I LS STGARCNH I DH VTSQL PY ALT VAAAAASG YLALGLTKSA 

I I I I I I I I I I I I I I I I M I I II I I I I M I I M I II I I I I I I I I I I M ! I I M I I 1 I I M I 
or f26-l VMAGAVCGDHCS PIS DTT ILSSTGARCNHIDHVTSQLPYALTVAAAAASG YLALGLTKSA 

430 440 450 460 470 480 

4 90 500 
orf26a.pep LLGFGXTGIVLAVLIFLLKDKKRANAX 
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I I I I I : I I i I I I I ! I t I j 1 I i I I I i I I 
o r f 2 6 - 1 LLG FGTTG I VLAVL I FLLKDKKRANAX 

490 500 



Homology with a predicted ORF from N gonorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from N. gonorrhoeae: 



or f 2 6 pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 60 
I I I I I M I M I I I I I I I I M I I I I II M I I M I I I I I I i I I I I I I I i I I II I 11 I I I I 

10 orf 2 6ng mqlidyshsffsvvppflalalavitrrvllslgigilvgvaflvggnpvdglthlkdmv 60 

orf26.pep VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

11111:1 I I I I I I i I M I I I I I I I I M I I I I I I I I 

orf2 6ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 120 



15 



// 



orf 2 6 pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 32 6 

I I I I I I I I I I I : M I II I : I 1 I I I I M I I I 
20 orf2 6ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 326 



25 



orf 2 6 pep VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

M I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I M I I I 11 I I I I I M I M I I I II I I 
orf2 6ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

orf 2 6. pep ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 4 46 

| I I M I I I I I M I I I I I I I I I II I I I I I 1 I I I I I I I I I I M M I I M I I I I I I I I I I M I 
orf2 6ng ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 4 46 

30 orf 2 6. pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 

\ M I I II I I I ! I I I I I I II I I I I I I I I I I It I I M M I I I I I I I I I I M I I I II I I 
orf26ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLG FGTTGIVLAVLIFLLKDKKRADV 506 

The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 

1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

35 51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

40 301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

45 551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcccaggacg aaaccgccgc tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

50 801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

8 51 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

55 1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtcccGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

60 1301 TCTCCGACAC GACCATCCTG TCGTCCACCG . GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

65 This encodes a protein having amino acid sequence <SEQ ID 696>: 
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10 



15 



20 



l MOLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 nftT.THT,KDMV VGLAWADGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TIAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

4 01 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

4 51 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 

10 20 30 40 50 60 

orf26-l pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I | I I I i I I I 1 I t I t I I M I I I I I I I It II I I I I I I I I I I M M i II I I I t I I II I I 1 I f I 
orf26nq MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 26-1 pep VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
| | | | | : I | | | I I I I I II I I I II I I I I I I I I 1 I I I I M I I I I M II I I I M I I I I II I I I 
orf26nq VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

70 80 90 100 110 120 



25 



30 



35 



40 



130 140 150 160 170 180 

0^f26-l pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 
| | | I I I I I I I I I I I I I I I I I I I i I I I I I I I I : I I I I I I I 1 I I : I M I I I I I I I I I I I I II 
orf26nq LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf26-l pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
I | j | | M I I 1 M I I I I I I I I I I I I M M II I I I II I M I I I II I I I I I I I I I I I I M I I I 
orf2 6nq TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 26-1 . oep AHDETAVSDATKGRVYALIIPVLALIASTVSAMI YTGAQASETFSILGAFENTDVNTSLV 
I : I I I I : I I I I I I I I II I I I I M I I I I I I I I I I M I II I I I I I II I I I I M I I I II I I I 1 
orf26na AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 



45 



50 



310 320 330 340 350 360 

orf 26-1 .pep FGGTCGVLAWLCTLGTIKTADY PKA VWQG AKSMFGAIAI LI LAWLISTWGEMHTGDYL 
I | | | | | | | I I I I I I : I I I I I I II I I I I I I I I II M I I I I I I I I I I I I II I I I I I I I I I I I 
orf2 6ng FGGTCGVLAVVLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 26-1 . pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
I | |-| II I I I I I M I I I I 11 I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 



55 



60 



65 



430 440 450 460 470 480 

orf 26-1 . pep VMAGAVCGDHCS P I SDTT I LS STGARCNHI DHVTSQLPYALTVAAAAASGYLALGLTKSA 
M | I I I I I I II I I II I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf2 6ng VMAGAVCGDHCS P I SDTTILS STGARCNHI DHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
orf 2 6-1 .pep LLGFGTTGIVLAVLIFLLKDKKRANAX 

1 I I I I! II 11 II II II I I I I I II I : : 
o r f 2 6ng LLG FGTTG I VLAVLI FLLKDKKRAD VX 

4 90 500 

In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 
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sp|P44263 1YF86_HA£IN HYPOTHETICAL PROTEIN HI1586 >gi I 1074850 | pir I IC64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 1 1574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
Score = 538 bits (1370), Expect - e-152 

Identities = 280/507 (55%), Positives - 346/507 (68%), Gaps - 7/507 (1%) 

Query: 1 MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRR L +L V 

Sbjct : 14 MELIDFSSSWISIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Query : 61 VG LAW ADGDW SLGKPKI LVFL I LLG I FTS LLT Y S G SNQAFADWAKRHI KNRCGAKMLTAC 120 

V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct : 7 4 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 132 

Query: 121 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTAS PMCVLMPVSSWGASIIA 180 

LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+ PMCV+MPVSSWGA II 
Sbjct: 133 LVFVTFIDDYFHSIAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

Query: 181 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 240 

+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 
Sbjct: 193 LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSIIMVFFVAYFSFDIASMVRHEKLALKN 252 

Query: 24 1 AQ DE T AAS DAT KGR V Y AL 1 1 PVLAL I AS TV S AMI YT G AQA SETFSILGAFENTDVN 2 96 

+ D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTWG 312 

Query: 297 TSLVFGGTCGVL — AWLCTFGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 

TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Query: 3 55 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVE PALI 414 

TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHI DHVTSQXXXXXXXXXXXXXXXXXX 474 

+ PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 L P C L S AVMAG AVCG DHCS P VS DTT I LS STG AKCN H I DH VTTQL PYAAT V AT AT S I G Y I W 492 

Query: 47 5 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 4 93 GFTYSGLAGFAATAVSLIVIIFAVKKR 519 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 697>: 

1 . .AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT . GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ED 698; ORF27>: 

1 . . KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC . 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA ■ TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

4 01 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKK LSRIVFS TVLLGFSAAL PAQTY5VYFN QNGKLTATMS SAAYIRQYSV 

51 VAG IAHA QDF YYPSMKKY5E PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N. m eningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of TV. 
meningitidis: 

10 20 30 

o^n KQWYADXS IKTEMVMVNDEPAKI LTWDESG 

° r P P : I M M I i I I I I II I I II I I I I I 

^rf27a LSEGTGXRYYRNGGKESE I QFKQNKANGVWKQWYADGN IKTEMVMVNDEPAKI LTWDESG ^ 

140 150 160 170 180 190 

40 50 60 70 80 

orf27 pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
11111111:11 I I ! I I I I I I I I 1 I I 1 I I I I II I I M I I I M I I II 1 I I I 
or f 27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701 > is: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

4 01 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 AT CC AT CATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 

1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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| | M I I I I I I I M I II I I I I I I I : M IIMII I III I III I I I : I M I I I I 

orf27-l MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf27a pep XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

Mill I I I I I I I I I M I I Ml I I I I I I I I I I I I I I I I I III III I I I 

or f 27-1 YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf27a pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I | | | | | | I I I I I I I I I I I I I i I t I I! i II M I I t I I 1 I M I I I 1 I I I I 1 : II I I i I I I I 
orf27-l NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf27a pep DEPAKILTWDESGRLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
| | | | | | | | | | | | | | I I Ml I I : I I II I ! I II I I I I I I I I I I I II I I I I I I II 1 I I II 
or f 27-1 DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf27a.pep YLIEPX 
I II I II 

orf27-l YLIEPX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N, gonorrhoeae: 

orf?7 pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

I I M I I I Ml I I I I I I I I I I M I I I I I I I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWDESG 193 

orf27 .pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

M | I M M I I I : I I II I ! I I I Ml I I I I I I I I I I I I I II II I I I I I II II I 
orf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 24 5 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 

1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AG T AT C AAG A CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT AT G AAG AT GG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1 . pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 
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-403- 

[ I I M I M I I I I II I M I I II I M I I I I I I I M M I I M M I I I I I I I I : I I I I M I M 
orf2 7nq MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

o-f27-l pep yypsmKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
I I M M I I I I I I I M I I I I I I I I I I 1 I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I 
orf27nq yypsmkkysepyivastqiksfvptlqngmlilwhfngqkkmaggfskgkpdgewvnwyp 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf27-l pep ngkksavmpyknglsegtgyryyrnggkeseiqfkqnkangvwkqwyadgsiktemvmvn 
I I I I I || I M I I I I I I I I I M II I I I I M I I I M I I I I I M I I M I I I I II I I I I I I I I I 
orf27ng ngkksavmpyknglsegtgyryyrnggkeseiqfkqnkangvwkqwyadgsiktemvmvn 

15 130 140 150 160 170 180 

190 200 210 220 230 240 

o r ^ 2 7 - 1 pep DE PAKILTW DESGRLLSELS I RHHQRNG VVLEW YEDGSKKSEAVYQDDKLVRKTQWDKDG 
| I I I I M I I I I I I i II I II I I I I I : 1 I I I 1 II I I M I I I I I I I I II M I M if M I I t I I 
20 orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 

orf 27-1. pep YLIEPX 
25 I I 1 I I I 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.coli. Purified GST- fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

4 01 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

4 51 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

1.51 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

7 51 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG VLVGLTIFWL AARIAAFI PG 

101 WGA5AS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTKAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK PA FLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 
351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 1 72aa overlap with an ORF (ORF47a) from strain A of N. 
meningitidis: 



orf 47 . pep 
orf47a 

orf 47 .pep 
orf 47a 



10 20 30 40 50 60 

MKFTKH PVWAMAFRP FYS LAALYGALSVLLWG FG YTGTHXLSG FYWHAHEMIWGYAGLVV 
1 I I | | I [ I I I I I I I II I I I M I I I I I I I I II I I M M M I I I I I I I I I I I I I I I I M I I 
MKFTKH PVWAMAFRP FYSLAALYGALSVLLWGFGYTGTHELSG FYWHAHEM IWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAV ATWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
I M I M 1 I I I I I I I M I I i I I I I I I I I I I i I I M I I I I I I I I I I I I I I I I I I I i I I I I I I 
IAFLLTAV ATWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
: 70 80 90 . 100 110 120 



130 140 150 160 170 

orf 47 .pep MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 
I I M | M I I I I I M ) I I I I M I I I I II ! t M I I I i I I 1 I I M It I I II I I I I 
o r f 4 7 a MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 

130 140. 150 160 170 180 



orf47a 



GTRII SFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
190 200 210 220 230 240 
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The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGAAATTTA 
TTCACTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACTAT 
TGGGGTGCGT 
CGCGGTGTGC 
ATGTTGCCGT 
CACGTCCAGC 
GTCGGGCTTG 
TTATTTCGTT 
CCGAAATGGG 
GCTGATGGCG 
CGGCAGGTGT 
GTGTTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATACGCA 
GTGGAAGTAT 
GTTGA 



CCAAGCACCC 
GCTCTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGGC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
GTTCGCGCTG 
TGCACAACGG 
GTGATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 
CACGGCGTGA 
GATTTTTACC 
AGCCGATGCT 
CTGATTGCGG 
TGTGCATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CCTCTTCGGT 
ATTCCTTGGC 



CGTTTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
ACTGGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTCGTCTTGG 
CAACCTAGGC 
CGGGTTTTAT 
AAACGGTTGA 
TTCGCTGTGG 
TGCCTTGGCT 
GTGCAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTCGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATTCGTCC 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GCGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TTCGCAGAAT 
GCGGTACGCA 
GGACTCTTGA 
CGGTCTGATT 
ATGTGCCGCA 
CTGCCCATGC 
GTCGGCGGCT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTTT 
GCGTTCGGAC 



GCCCGTTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CAACGCAATT 
CGCGGCGTTC 
GCGGATTGCA 
GGTACGCGGA 
GATTCCCAGT 
TGACCGCCAT 
TTCGCGTTTG 
GTATAAGCCT 
ATCTGTTTAC 
CCCGCTTTCC 
CGTGCTGACT 
ATCCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



This encodes a protein having amino acid sequence <SEQ ED 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD. GRPG* 

35 ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 
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40 



45 
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65 



orf 47a . pep 
orf47-i 

orf 47a . pep 
orf 47-1 

orf 47a . pep 
orf47-l 

orf 47a .pep 
orf47-l 

orf 47a. pep 
orf47-l 



10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

I I I I II I I I I M M i I I I i M M I I M I I I I II I I II I I II II II II I 1 I I I I I M I II I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

II I II M I I I M I I II I M 1 I I M I I I I M I I I I I I I I M I II I I I I I I I I I I I I 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I I I I I II I II I I I I II. I I II I I I I M II II I I I I It I I I I I M I I I I I M I I I II I I II 
MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTR 1 1 S FFTSKRLNV PQI PS PKWVAQAS LWLPMLTAMLMAHGVMPWLSAAFAFAAGVI FT 
I I I I I M I II M I I M I I I I I I I I I I I 11 I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I 
GTRI I S FFT SKRLNVPQI PS PKWVAQAS LWLPMLTAMLMAHGVLAWLS AVFAFAAGVI FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
I I I I I I I I I I I I I I I II | I M I I I I I I I I I I M II I I I I I II I I I I I I I I I I II I I 1 II I 
VQVYRWW YKP VLKEPMLW I LFAGYLFTGLG LIAVGASYFK PAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 
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orf 4 7a. pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
M I M I I I I I I I I I I II I I II I I I II I I II I I I I 1 I I ! I I II I I I I I I I ! I M I I I I I I I 
orf 4 7-1 LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 4 7a .pep LALLVYAWKYIPWLIRPRSDGRPGX 
I I I I 1 I II I I I I I I I I I I I I I I I I I 
orf 4 7-1 LALLVYAWKYIPWLIRPRSDGRPGX 

370 380 

Homology with a predicted ORF from N. gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 



N. gonorrhoeae: 



15 



20 



25 



ORF4 7 MK FT KH P VW AMA FR P F Y S LAAL YG AL S V LL WG FG YT G T HE L S G F YWHAHEM I W G YAG LW 

I I I M 11 I I I I II I I 11 I I I I ! I I I I I I I M I I I M I I I I I I I I I I M I M I I I I I I I I I 
ORF4 7ng MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

ORF47 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
I I I M I I I I I I 1 I II I I I I II II I I I II I I I I I! I I I I I I I I : I I I II I I I I I II I I I I 
ORF4 7ng IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 

ORF4 7 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 
I I I I I I II I I : 1 I I I I I I I : I I I II I I M I II I I I I I M II I I I M M I I II 
ORF4 7ng MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 



60 



60 



120 



120 



172 



180 



30 



35 



The ORF47ng nucleotide sequence <SEQ ED 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RR NYVAVF A I FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

2 01 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VOV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an He/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 

TM segments in ORF47ng 



40 



INTEGRAL 


Likelihood 




-5. 


.63 


Transmembrane 


52 - 


- 68 


INTEGRAL 


Likelihood 




-3. 


,88 


Transmembrane 


169 - 


- 185 


INTEGRAL 


Likelihood 




-3. 


.08 


Transmembrane 


82 - 


- 98 


INTEGRAL 


Likelihood 




-1 , 


. 91 


Transmembrane 


134 - 


- 150 


INTEGRAL 


Likelihood 




-1, 


.44 


Transmembrane 


107 - 


- 123 


INTEGRAL 


Likelihood 




-1. 


.36 


Transmembrane 


227 - 


- 243 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 



45 



50 



55 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGAAATTTA 
TTCACTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACCGC 
TGGGGTGCGG 
CGCGGTGTGC 
ATGtcgCCGT 
CACGtccAgc 
GTCGGGCCTG 
TTATTTCGTT 
CCGAAATGGG 



CCAAACATCC 
GCACTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGAC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
ATTCGCAATA 
tGCACAACGG 
GTTATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 



CGTCTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
TCTCGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTTGTGCTGG 
CAACCTAGGC 
CGGGCTTTAT 
AAACGGTTGA 
TTCGCTGTGG 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GAGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TtcgCAAAAC 
GCGGTACGCA 
GGACTCTTGA 
CGGCCTGATT 
ACGTGCCGCA 
CTACCCATGC 



GCCCGTTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CGGCGCAACT 
TGCGgcgTTC 
GCGGATTGCA 
GGGATGAGGA 
GATTCCCAGT 
TGACCGCCAT 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



-407- 



PCT/IB98/01665 



10 



651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ACTGATGGCG 
CGGCGGGCGT 
GTATTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATCCGCA 
GTGGAAATAC 
GTTGA 



CACGGCGTGA 
GATTTTTACC 
AACCGATGCT 
CTGATTGCGG 
CGTACATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CGTCTTCGGT 
ATTCCGTGGC 



TGCCTTGGCT 
GTACAGGTGT 
GTGGATT CTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTCGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATCCGTCC 



GTCGGCGGCT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTGC 
GCGTTCGGAC 



TTCGCGTTTG 
GTATAAACCC 
ATCTGTTTAC 
CCTGCCTTCC 
CGTGCTGACT 
ATTCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



15 



20 



25 



30 



35 



40 



45 



50 



55 



This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

2 01 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKE PMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47-1 .pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

I I I M I I I I I I I I I I I M I I I I I I I I 1 I I I I I I I t I I I I I I I I I I I I I I I I I I I 

orf4 7ng-l MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47-1 .pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
I 1 I M I I II I I 11 I I I f ! I I t I I I I I I M I I I I I I I I I I I I I : I I I I I I I I I M I I I II 
orf 4 7ng-l JAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 . 

orf 47-1 .pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
I I I I j I M ! I : I I I i I I I I : I U I I I I I I I M I I I I ! M I I ! I I I I II I I I I I I I I I I 1 I 
orf 47ng-l MAL PV I RSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGL VMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 47-1 .pep GTRI I S FFTSKRLNVPQIPS PKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 
I I I M I I I I I I I I I I I I I I I II I I I I I I I II I I I I : I I I i I I : I I I I : I I I I I I I M I 
orf 4 7ng-l GMRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 4 7-1 .pep ' VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
I I i I M I I I I I I I i I M I II 1 II I I I I II M II I I I I M I I I I I I I I I I I I I I I I I I I I I 
orf 4 7ng-l VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

250 . 260 270 280 290 300 

310 320 330 340 350 360 

orf 47-1 .pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
I I | I I I I I I M II I I I I I I I I 1 I I M I I I II M I I 1 I I I I I II I I I I I I I I 1 I I I I I I I 
orf 4 7ng-l LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 



60 



370 380 
orf 47-1 .pep LALLVYAWKY I PWLIRPRSDGRPGX 
I I I 1 I M I M i I I I I II I I I I I I I I 
orf 47ng-l LALLVYAWKYI PWLIRPRSDGRPGX 

370 380 



65 



Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 

gnl I PID | e24 654 0 (Z73914) ORF396 protein [Pseudomonas stutzeri] Length = 396 
Score = 155 bits (389), Expect = 5e-37 
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Identities =121/391 (30%), Positives - 169/391 (42%), Gaps 21/391 (5%) 

Query 1 PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WK HEM++G+A + 

Sbjct: 14 PIWRIAFRPFFIAGSLYALLAIPLWVAAWTGLWP--GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query: 60 VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAGFLLTAVQTWTGQTAPSGNRLVGIJ^WIJU^L-GWLFGLPAAWLAPLDLLFLVALVW 130 

Query: 120 CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 179 

MA + + +RNY V + ++ G +V+ + L 

Sbjct: 131 MMAQMLWAVRQKRNYPIVWLSLMLGADVLILTGLLQGNDALQRQGVLAGLWLVAALMAL 190 

15 Query: 180 I GMRI I S FFTSKRLNVPQI PS P -KWVAQAS LWL PMLTAI LMAHGV MPWLSAAFAFA 234 

IG R+I FFT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 
20 GV +++ RW+ K + K +LW L L+ + + +F A 

sb j ct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG + P+AFL FS + 

25 Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FN LGTAARV FLS VAW PVGG LW 365 



30 



Query: 354 TSSVLFALALLVYAWKYIPWLIRPRSDGRPG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 36 6 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 3 96 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 85 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 71 5>: 

35 1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

40 251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

4 01 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

45 501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 . . MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

50 101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . - 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with a predicted ORF from N. gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
55 N. gonorrhoeae: 
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MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 30 

| || 1 | | II ! II I t I I I I M I M I I 

14 6 



orf67ng TNFEIAVLSGMTVRVFYCARPAPVNGGRLK^ 

orf67 pep VFQASPVWTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 90 

I I I I I I I I I : I : I I III!:: : : : I I I I I : I I I : ' 

VFQAS PVVVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVS I 206 

150 

266 



orf 67ng 

]0 or f 67 . pep X WXXXXSRGFXXHRMNI^FNVSVOT 



orf 67ng 



TRVGGKSTCYFFSRI DAVS DVSVGDARTDIGFEFWE FE I VNGGQAERRNGVECAVFLMF 



orf67 dpd CLGFFW WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

° r * P P |,| :: |: I: : I : I I I I ! M HIM: 

orf 67ng rllVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEWADRHPGVDGM 326 

The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 718>: 

1 m^se^vgsiv nvgvdesvgf sppfpsiqhf yrfhrihrir lfrppgpmql 

20 51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI PACAGMTNFE IAVLSGMTVR 

10 1 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF vgfeagvfqa 
15i S^WVAVAGV QGQAGRDVYA HARHRAEAO A AAAVAFLIGV FLRMSV RINR 
201 NCCV^ITRVG GKSTCYFFSR IDAVSDVSVG DARTDIGFEF WE FEIVNGG 
251 OAERRNGVE C AVFLMFRLLV FYVKLV AAKS FIILSFQLFY VHGIFIW PF 

25 301 PVTGIIRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 

351 IVGNAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

30 Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATG^TTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

lOT TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

35 i 5 : t CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

25T CTATTGCGsG CAT CAT G AC G CCG rAACGTT ATGAGCAGGT TCAGGAAAAA 

30 i TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

3 51 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

40 401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 

1 M p AFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
5^ H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 
101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

45 Further work revealed the complete nucleotide sequence <SEQ ID 72 1>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

50 201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

55 451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

5 This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSL OSGIFVI LGIGATWAW I WWKKRQRIQ 

10 201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homoloeue of H.influenzae (accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

15 Orf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 2 0 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 7 9 

Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 
20 L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 

DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 

Orf78: 122 VFVTAGISRKVSYLRFI IMDGLAA 145 
+++ +GI+R+VSY+RF+++D AA 
25 DedA: 140 IYMVSGITRRVSYVRFVLIDFCAA 163 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N. 
meningitidis: 

30 10 20 30 40 50 60 

orf78 .pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
I II : I I II I I I II I ! I I II I II I i I I t I ! I I I ! M I II M ! I I I I I I II I I M I I I I M I 
or f 7 8a MFALLEAFFVEYG YAAVFFVLVICG FGVPI PEDLTLVTGGVI SGMGYTNPH IMFAVGMLG 

10 20 30 40 50 60 

35 

70 80 90 100 110 120 

or f 7 8 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 
I t I II ! I I I I I I II I I I I I I I I MM II I I I II M I I I 1 I I II II I I II II M 
orf78a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 
40 70 80 90 100 110 120 

130 140 
orf78 .pep AVFV TAGISRKVSYLR FIIMDGLAA 
1 I I M II II I M I II I I : ! M I M I 
45 orf78a AVFV TAG I SRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHN I D W LMAKMH S LQSGIFIA 

130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

50 101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

2 01 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

55 351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

• 4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG AC ATT AT C AG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

5 This encodes a protein having amino acid sequence <SEQ ID 724>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LG V LAAALA W F WWRKRRHYQ 

10 201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf7 8a pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I M : I M I I I I I I I I I I I I M I II I I I I I I I I M I II I M I I I M I I I I I I I I I I I I I I I 
15 orf78-l M FAFLEAFFVE YG YAAVFFVLVI CGFGVP I PE DLTLVTGGVI SGMGYTN PH IMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78a . pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
20 1 I I 11 I I I I I I I M I I I I I I I : M I I II I I I I I I I I I I II I M I I M f I I I I I II I I II I 

orf 78-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

25 orf 78a. pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

I I I I I I I I I I M I I I I I : I I I I I M I I i I I : I I M I I I I I i I t II I I I I I I I I I I I I I : 
orf 7 8-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 
130 140 150 160 170 180 

30 190 200 210 220 

orf 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
||: | : : : | I : I I : I I : : I : I I : : I : I I I I : I I I I I M I I I I : : I I 
orf 7 8-1 LGIGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 

190 200 210 220 

35 

Homology with a predicted ORF from N. gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf 78. pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 
40 II I I I It II I i I I I I M I I I II I I I I I I II 

orf7 8ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf78.pep IIMDGLAA 145 
: I I I I 1 I I 

45 orf78ng LIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 

1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMH5LQ S GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 

50 101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 

1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAG ATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

55 151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAAT CCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 
401 
451 
501 
551 
601 
651 



TTTGCGGACT 
ATGTGCGCTT 
GTTTGGATTT 
GGCGAAAATG 
TGGCGGCGGC 
CTTTACCGCG 
GGCAGCGAAA 



GCCGTTTTCG 
TCTGATTATG 
ACTTGGGCGA 
CACAGCCTGC 
GCTGGCGTGG 
CACAATTGAG 
AAAGCGGCAC 



TTACCGCCGG 
GACGGGCTGG 
GTACGGCGCG 
AATCGGGCAT 
TTCTGGTGGC 
CGAAAAACGC 
AGAAGCAGCA 



CATCAGCCGC 
CCGCGCTGAT 
CACAACATCG 
CTTCATCGCA 
GCAAACGCCG 
GCCAAACGCA 
GTAa 



AAAGTATCGT 
TTCCGTGCCC 
ATTGGCTGAT 
TTGGGCGTGC 
ACATTATCAG 
AGGCGGAAAA 



10 



This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 



15 



20 



25 



30 



35 



orf78-l .pep 
orf 78ng-l 



orf76-l . pep 
orf78ng-l 



orf 78-1 . pep 
orf 78ng-l 

orf 78-1 . pep 
orf78ng-l 



10 20 30 40 50 60 

MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

I I I : I f I I 1 I J I I I I 1 I 1 I I 1 I t I 1 I 1 I I I I I I f f I I I I 1 I 1 I 1 I I 1 I I I I I I 1 I 

MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
I i : M I : I I I I M ! I M I I I : I I I I I I I I I I ! I t I I I 1 I I I I I I I I 1 I I I I I I I I I I I I 
V LAG DGVMFAAGR I WGQK I LKFKPIAR I MT PKRYAQVQEK FDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I M I 1 I I I I I I I M I It : If I I I I I I M I ! : I I I I I ! I I I I I I I I I I M I I 1 I I M I I : 
AVFVTAGISRK\ r SYLRFLIMDGIAAXISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 
130 140 150 160 170 180 

190 200 210 220 

LG I GATWAW I WWKKRQR IQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 
II: I : : : I I : I I : II : : I : I I : : I : 11 I I : I I I I I I II I I I : : II 
LG V LAAALAW FWWRKRRH YQLYRAQLS EKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 



40 



45 



50 



55 



60 



Furthermore, orf78ng-l shows homology to the dedA protein from H.influenzae\ 

sp|P45280 I YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi I 1073983 I pir 1 I D64 133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect - 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 

Query: 5 LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGVL 62 

L FF EYGY AV FVL+ICGFGVPI PED+TLV+GGVI+G+ N H+M V M+GVL 

Sbjct : 21 LIGFFTEYGYWAVLFVLIICGFGVPI PEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

Query: 63 AGDGVMFAAGRIWGQKILKFKPIARIMT PKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 

AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
Sbjct: 81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 14 0 

Query: 123 FVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWL^4AKMHSLQSGIFIALG 182 

++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I + I +G 
Sbjct: 141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 200 

Query: 183 VL 184 
L 

Sbjct: 201 YL 202 
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Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 87 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 T-TCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

10 201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

15 This corresponds to the amino acid sequence <SEQ ED 730; ORF79>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 

20 1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

2 01 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 
25 251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

3 51 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 
4 51 CACGGCGAAG CGCATCAGCA CTAA 

30 This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

35 Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORE from N. meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N. 
meningitidis: 

40 10 20 30 40 50 60 

orf 7 9 . pep M KKLLAAVMMAGLAGA VS AAGVHVEDGWARTTVEGMK I GG AFMKI HN DE AKQD FLLGGS S 

it iTTTTTTTI 1 1 1 1 n 1 1 1 m 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 ri 1 1 n m 1 1 1 1 1 1 1 

O r f 7 9 a MKX L L AA VMMAG LAG A V S AAG I H VE DGW ARTT VEGMKMGG AFMK I HN DE AKQD FLLGG S S 

10 20 30 40 50 60 

45 

70 80 90 100 110 120 

orf 7 9 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

I I | M I I 1 I I I I ! II I I I I I I I I I I I I I I M I I I I I I 1 M I I I Mill I I I II 

orf 7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
50 70 80 90 100 110 120 
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130 140 
orf 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
I I I I I I I I I M I M I I I I IN 11:1 
5 orf? 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHKHGEAHQHX 

130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

in 101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CAT AT CAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

15 351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

4 01 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
20 51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 

101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

?5 orf 7 9a pep MKXLLAAVMMAGLAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

i I l | | I I I I ! M I I M II I I: I II II I I M I I 1 II I : I I I I I I 11 I M I I I I I M I I I I 
0 ^ 7 9 _ 1 MKKLLAAVMMAGLAGAVSAAGVHVEI5GWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 

30 70 80 90 100 110 120 

orf 7 9a pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

I I M I I I I I I I I I I I M I I I I I I I I M M I II I I I I I I I I I I I I I I I I Mill Mill 
orf 7 9-1 pvADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

35 

130 140 150 

orf 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
1 1 I I I I M 1 II II M M I 111 I 1 M II M I I I M I I 
orf 7 9-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
40 ' 130 140 150 

Homoloev with a predicted ORF from N. gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

45 orf 7 9 pep FWKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

I 1 I I It I I M M : II II I M II M I M M I 
orf 7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 79 .pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 14 7 

50 * ' I I I M I I I M I I M II I I II M I I M I I I I I I I II I I Ml 1 I I I 

orf 7 9ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 73 5> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
55 51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

5 201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

10 4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ED 738; ORF79ng-l>: 

1 MKKLLAAVMM AG LAG A V S AA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

15 151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9-1 pep MKKLLAAVMMAG1AGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

I I I I II I I M I I I I I i I M I II II 1 I I I M I I I I I I I : I I M I I I 1:1111 

20 or f 7 9nq- 1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 120 

o^f 7 9-1 pen PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
25 ~ | | | | | I I i i I I I 1 M i I M I I M : II I I I I I I I I M I I M 1 I I I II I i 1 M I I I I I I I I I 

orf 7 9ng-i PVADRVE VKT HINDNGVMRM RE VKGGVPLEAKSVTELKPGSYHVMF'MGLKKQLKEGDKIP 

70 80 90 100 110 120 

130 140 150 

30 or f 7 9- 1 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 

I I I II I I I I I I I I I I t I I I I i I I I I ! I I II I I I I I I 
orf79ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 

130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

35 gi i 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

Score = 63.6 bits (152), Expect *= 6e-10 
- Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 2 4 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 
40 " V+ W G MI N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
45 Sbjct: 87 ER- 1 E I PPKGKVEFKHHGYKVMI IGLKKRIKEGDKVKVELI FEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N. meningitidis and TV. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (1 5.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
18B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in Kmeningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

2 51 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGC CGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA As CATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SAS DQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

2 01 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 741>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 
51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 
101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 
151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 
201 . YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf98.pep ' MTVTAAEGGKAAKALKKYLITGILWJLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
M I M ! I I I i I I ! I I I I ! I i I I I 1 I t 1 I I I i I I I I II I ! I I I M I I I I I I I I I i I M I 
orf 98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 
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10 



15 



70 80 90 100 110 120 

orf98 oep gfnipglgvivaiavlfvtglfaanvlgrqilaawdsllgripwksiyssvkkvseyvl 

*" | | | | M I I I I I I I I I I I I I I I I I I I I I M I M I M M I I II I I I I M I M - i 

orf 98a QFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf98 pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 
1 | | | | | | | | | | I II I I I I I I I I M 1 I M I I I I I I ! II 1 I M I I I I M I I I I I I I I I I I 
orf98a SD SSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

O r f 9 8 pep IMVKKS DVRELDMS VDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADL PEQQX 

I I I M I I I I I M I I 1 I I I II I II M I I 1 I II I i I I M I I I I 1 I I I I I 

orf 98a IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 



20 



25 



30 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
GGGCGTTATC 
CAAACGTATT 
CGGATTCCGG 
NTCGTTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTTCCG 
AGAAAAGCGA 
TATGTGATTT 
ATTGGCAGGA 
AA 



CTGCGGCCGA 
ACGGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTGAAGTC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCGA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGT 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CAT CT ATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGAAG 
ATCCGACCGG 
CT CG AT AT G A 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ATCAGCTCGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACACCA 
CATTCGTGTC 
GACGGCG ATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGATTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCGGGGCT 
TTATTTGCCG 
CTTGTTGGGG 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



35 This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPN PTGGYY IMVKKS DVRE LDMSVDEALK 

40 201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 98a . pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml I I I M I I I 1 I I 1 II I I I I I M II I I I I M I II I I I I I i i I I I I I I I I M I I I II I I 
orf 98-1 ' MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

I I i I I I M I I I I I I I I I I I I i I I I I M I I I I I I I M I I I I I I I I It I I I I I M I M I I I 
orf 98-1 G FN I PGLG VI VAIAVLFVTG LFAANVLGRQ I LAAWDSLLGRI PWKSIYSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a. pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
M | M I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I II I I I II I I I I I I I I I I I I M 
orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a. pep IMVKKS DVRELDMS VDEALKYV I SLGMV I PDDLPVKT LAG PMPSEKADL PEQQX 

II I I I I M I II I I I I I I I II M I I I I I I M I I I I M I I I I I II I I I M I II M I 
orf 98-1 IMVKKS DVRE LDMSVDEALKYVI SLGMVI PDDLPVKT LAG PM PS EKADLPEQQX 

190 200 * 210 220 230 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



PCT/IB98/01665 



-418- 



Homologv with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 



10 



15 



N. gonorrhoeae: 

orf 98 .pep 
orf 98ng 
orf 98 .pep 
orf 98ng 
orf 98 .pep 
orf98ng 
orf 98. pep 
orf 98ng 



10 20 30 40 50 60 

MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSAS DQLVNLLPKQWRPQYVL 60 

|| | | M M I I ! I I I I I I I M I I I I I II M I M I I I i M I M I I I II I I I I I II I I I I I 
MTEPAAEGGKAAKALKKYLITGILWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 120 

I I I I M I I I I I I I I I I I I I I I 1 I I I I I M I I I M I I I I I I I I I I I II I I I II I I I I : I 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPWKSIYSSVKKVSESLL 120 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

I | | | | | | | | M II I I 1 I I M I M I I I I I 1 I 1 M I I I 11 I M M I I I I I I I I I I I I I I I 

S DS S RS FKT PVLVP FPQSG I WT IAFVSGQVSNAVKAAL PQDGDYLS V YVPTT PN PTGGYY 180 

IMVKKS DVRELDMS VDEXLKYVI SLGMVI PDDLPVKTLAX PMP SEKADLPEQQ 233 

I i I M I I II I I I 1 I I I I I I II I I I I I II II I M I I I I I IN 111:11111 
IMVKKS DVRELDMS VDEALKYVISLGMVIPDDLPVKT LAG PMP PEKAELPEQQ 233 



20 The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 



25 



1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLX 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKS DVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 



30 



35 



40 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
CGGCGTTATT 
CAAACGTGTT 
cggaTTCCGG 
ATCGCTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTCCCG 
AG AAAAG CG A 
TATGTGATTT 
ATTGGCAGGA 
AA 



CTGCGGCCGA 
ACAGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTCAAATC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCCA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGC 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CATCTATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGCAG 
ACCCGACCGG 
CTCGATATGA 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ACCAGCTTGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACGCCG 
CATTCGTGTC 
GATGGCGATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGAGTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCCGGGCT 
TTATTTGCCG 
CCTGTTgggg 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



45 



50 



55 



This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98-1 .pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
Ml I I I t I I I I I I I I I I I I I I I M I It M I! I I I M I I I I M I M 11 II I I II I M I I I 
orf98ng-l MTEPAAEGGKAAKALKKYLITG I LWLPIAVTVWVVSYIVS AS DQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98-1 .pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

i ii 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 ii 1 1 1 1 1 m i i 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 M I 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCIYIB98/01665 

-419- 

orf98na-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
° ri ^° y 70 80 90 100 110 120 

130 140 150 160 170 180 

^ nrf Q 8 -l D eo SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

D o—^o -P P ,, M | M MM I I II II I 1 I 1 I I 1 I I I ! i I I I 1 I I I 1 I = i I i I I I I I I I I I I I I 

orf98na-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
° rry 9 130 140 150 160 170 180 

in 190 200 210 220 230 

orf98 _l pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
| | M | | | | I | I 1 I I I I I I II I I I I I I I I I I I I I I I I I I M I I I H I : I I I I M 
orf98na-l IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 

190 200 210 220 230 

15 Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 89 

20 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

25 201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

2 51 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 GaGAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

4 01 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

30 4 51 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

35 7 01 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

7 51 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

40 951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAAT CGG AG A ACCGCAGAAG 

1151 GCGGAGGCGC AC... 

45 This corresponds to the amino acid sequence <SEQ ID 750; ORF100>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

50 201 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

55 1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-420- 



PCT/IB98/01665 



201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAACAT 
CCGGAAAAAC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTTCTA 
AGGCGCAGCG 
GCAGCGTTAG 



AAGATGCAGC 
CTTGAACAAG 
AACTAGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTTGGGCA 
CCAGCTGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
AG CAGC AT AG 



GTTTCGGTTC 
GCGGGTTTGG 
CTCACGCGTG 
TGATGCTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TCGACGAAAT 
TTGGAAGCCG 
CTGA 



GGCGCGTAAA 
CGTATTTTGA 
TTGGTCAACA 
CGCGCACGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TCTCCGATGA 



GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCCGGACAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAG CC AT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
CGAACGTCAC 



This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1: 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 
51 AVWWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 
251 CLKRI PDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 
4 01 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of N. 



meningitidis: 

10 20 30 40 50 60 

orf 100. pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
I I i I I I I I I I I I I I I I I I I I 1 I I I t I I I I t 11 M I t I I I I II I I It I I I I 11 I I I I M 
orf 100a MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I M II I I I I I I I M II : III 
orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100 . pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I I I I I I I I 1 I 1 I I I I I I I t I M I I It I I I I I I I ! I I II I It II II I M I I I I I I I I I 
orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170' 180 



190 200 210 220 230 240 

' orf 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERY QNWAYRRQLA 

M I I I I I I I 1 I I I I I : I II I II M I I I I I I i I I I I I I I I I I I I I I I I 11 I I I I I I I 
orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
I M I I I I I I I I It I II I II I I I I II M I I I I I I M I I I I I I I I I M I I I I I I I I I I 1 I I 
o r f 1 0 0 a DAADAAALKTCLKRI PDS LKNGELS VS VAEKYERLGLYADAVKWVKQHYPHNRRPELLE A 

250 260 270 280 290 300 



9924578A2_I_> 
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310 320 330 340 350 360 

o r f 1 0 0 oeo FVE S VRFLGEREQQKAI DFADAWLKEQPDN ALLLMYLGRLAFGRKLWGKAKG YLEAS I AL 
' F [M II I I MM: I M I I M I I II IN M 1 I I I I I M I M I : II I I I M I M M I M I I I 

orf 100a FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 



370 380 
o r f 1 0 0 . pep KPS I SARLVLTKVFDE IGEPQKAE AH 
M I I I I I I I I : II I I I I M I II M : 
orf 100a KPS I S ARLVLAKV FDETGE PQKAEAQRN LVLAS VAEENRPS AETHX 

370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CA^CTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGG CGGG 
GCCGGGCAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AG AAAG C CAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MKTWWIWL FAAAXGLALA 
A WW Y FL FK FIIGVLNXPE 



EKAELEASRV 
PEKQQLSRYL 
FDRGDALQVL 
CLKRIPDSLK 
FVESVRFLGE 
KGYLEASIAL 
SAETH* 



LGNKEAGDNR 
LLAESALNRR 
AKTEKXSKAG 
NGELSVSVAE 
RDQQKAIDFA 
KPSISARLVL 



SGIXTGDVYI 
KMQRFGSARK 
T LALMLG AHA 
DYEAAEANLH 
AXGKSEMERY 
KYERLGLYAD 
DAWLKEQPDN 
AKVFDETGEP 



VLGQTMLRIN 
GRKAALALNK 
AGQMENIELR 
AAAKMNANLT 
QNWAYRRQLX 
AVKWVKQHYP 
ALLLXYLGRL 
QKAEAQRNLV 



LHAFVLGSLI 
AG LAY FEGRF 
DRYLAE I AKL 
RLVRLQLRYA 
DAADAAALKT 
HNRRPELLEA 
AYGRKLWGKA 
LASVAEENRP 



ORFlOOa and ORF 100-1 show 95.1% identity in 406 aa overlap: 



10 20 30 40 50 60 

orf 100a pep MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
| | | I | I I I I I I I I I I I I i II M M ! I M I I M M I I I I I II I I I I M M I I I I I M II 
orf 100-1 MKT VW I VVLFAAAVG LALASG I YTGDVY I VLGQTMLRIN LHAFVLGSLI AVVVWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orflOOa pep ■ FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
M I I M I M I I M I II I I M I I I I M I I I I I I I I I I M I I I I M I M M I I I I II II I 
orf 100-1 FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 100a . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

M I II I M M I I I I I M I I II I I I I II M I I I I I II I I I I I I I I I I M I I I 1 M 

orf 100-1 TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
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130 



140 



150 



160 



170 



180 



10 



15 



20 



25 



190 200 210 220 230 240 

or-f 100a . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

Mi I M 1 I I I I I I I I I M I I I I I I 1 I M II Mill I II I I I M I I I I I I I I I 

orf 100-1 AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
I | | M M I M I I I I II I I II I I I I I M II I I I I I I I I I I I 1 I I I I I I I M I I I I I I I I I I 
orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orflOOa pep FVE S VRFLGERDQQKAI DFADAWLKEQPDNALLLX YLGRLAYGRKLWGKAKG YLEAS I AL 
M I I I M I I I I : I I M I I I I I I I I I M I I I I M I I I I II I II I 1 I I I M M I I I I I I I 1 
orfl00-l FVE S VRFLGEREQQKAI D FADAW LKEQ PDNALLLM YLGRLAYGRKLWGKAKG YLEAS I AL 

310 320 330 340 350 360 

370 380 390 400 

orf 100a . pep KPS I S ARLVLAKVFDETGE PQKAEAQRNLVLASVAEENRPSA-ETHX 

I I I I I M I M I I I I II I I I I I I I I I I I I I I : I : : : : I : I I I 
orfl00-l KPSI SARLVLAKVFDE I GE PQKAEAQRNLVLEAVS DDERHAALEQHSX 

^370 380 390 400 

Homology with a predicted ORF from N. gonorrhoeae 

ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 
N. gonorrhoeae: 



30 



35 



40 



45 



50 



55 



orf 100 .pep 
orf 100ng 
orf 100 .pep 
orf lOOng 
orf 100 .pep 
orf lOOng 
orf 100 . pep 
orf lOOng 
orf 100 .pep 
orf lOOng 
orf 100 .pep 
orf lOOng 
orf 100 .pep 
orf lOOng 



MKTVVWIVVLFAAAVGLAIASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

M | | | I M M I I I I I I I I I I I I I I I M II I I I I II I M M I M I I I I I I M I I I I I M I I 

MKTVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

FIIGVLNIPEKMQRFGSARKGXPCXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

1111111111:1:1 I I I I I I I I I I M I I M I I I I I I I I I II I I I I I I ! I : II! 

FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 

TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

I I I I I I MMIIIMI I I I I M I I ! i M I I I I I II I I M I I 1 I II 1 I II I ! I I I I II 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 24 0 

M I I I M I I M I I I i : M M M I II \ I I I I I M I I I I I I I I I I I I I I I I I I I I I II I : I 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 24 0 



DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
M I I I I I I I I I I M M I I M I II I I II I I I I I I I I I I I I M II I M II I I I I I I M I I I 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKOTKQHYPHNRRPELLEA 



300 



300 



360 



FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 
I I M I M I I i I M M M I I I I : M I M I ! I I I I I I I I I II I : I I I I I I I II I I II I I I I I 
FVESVRFLGEREQQKAI DFADSW LKEQ PDNALLLM YLGRLAYGRKLWGKAKG YLEAS IAL 3 60 



KPS I SARLVLTKVFDE IGE PQKAE AH 
INI II I I I : I I I I I : : Mill: 

KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 



386 



405 



The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 



60 



65 



i 

51 
101 
151 
201 
251 
301 
351 
401 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



CCGGAAAAAC 
AAACCGGCGC 
AGATGAATGC 
TTCGATCGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGGC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTGTTG 
AAGCACAGCG 
TCCGCCGAAA 



AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGATGCGTT 
GCGTTGGGCA 
CCAGATGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATTCTTGGC 
CGGCCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCGTTGA 



CCGCTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
TATTGCACTG 
TTGACGAAAC 
TTGGCAAGCG 



CTGCTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAaccG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGCACAGTCG 
TTGCCGGGGA 



AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCC 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGagcGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGTAAGGCA 
TTCCGGCGCG 
CAAAAAGCCG 
AAACCGCCCT 



This encodes a protein having amino acid sequence <SEQ ID 756>: 



1 MKTWWIWL FAAAVGLALA 

51 A WW Y PL FK FIIGV LNIPE 

101 EKAELEASRV 

151 PEKQQLSRYL 

201 FDRGDALQVL 

2 51 CLKRIPDSLK 

301 FVESVRFLGE 

351 KGYLEASIAL 

4 01 SAETR* 



LGNKEAGDNR 
LLAESALNRR 
AKTEKLSKAG 
NGELSVSVAE 
REQQKAIDFA 
KPSIPARLVL 



SGIYTGDVYI 
NMRRSGSARK 
TLALMLGAHA 
DYEAAEANLH 
ALGKSEMERY 
KYERLGLYAD 
DSWLKEQPDN 
AKVFDETAQS 



VLGQTMLRIN 
GRKAALALNK 
AGQMENIELR 
AAAKMNANLT 
QNWAYRRQMA 
AVKWVKQHYP 
ALLLMYLGRL 
QKAEAQRNLV 



LHAFVLGSLI 
AGLAYFEGRF 
DRYLAEIAKL 
RLVRLQLRYA 
DAADAAALKT 
HNRRPELLEA 
AYGRKLWGKA 
LASVAGENRP 



ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 



orf 100-1 .pep 
orf 100ng 

orf 100-1 .pep 
orf lOOng 



orf 100-1 .pep 
orf lOOng 

orf 100-1 .pep 
orf lOOng 

orf 100-1 -pep 
orf lOOng 

orf 100-1 .pep 
orf lOOng 

orf 100-1 .pep 
orf lOOn 



10 20 30 40 50 60 

MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

I | I | M I I I I II I I I M I I I ! I I I I M I I I I M II II I I I I I! I II M I I I I I II 

MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

|| | M | || I I : I : I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I MINIM 
FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

| | | | | | | I I I M I I I M I 11 I I M M M M M I M I M M I I I I I I M II I M II II M I 
TUVLMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
130 140 150 160 170 180 

190 200 210 • 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
I I i I M | I M M II M I I II II M I II I I M I M I MM II I I I I I I I M II I I I I M M 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQt3WAYRRQMA 

190 200 210 220 230 240 

250 260 270 280 290 300 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

M || | || M I M II II I I II M II I II I I M* II M I 1 I II II II II I II I I II I M I I M 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
250 260 270 280 290 300 

310 320 • 330 340 350 360 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

I I | M I I M M M M II II M M II M I II M 1 M M I 11 M M II I I M I II I M I I I I 
FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
310 320 330- 340 350 360 

370 380 390 400 

KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

M I I 1 II I! II I I I I : : I M I II I II II M : : : I : I 
KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX . 



WO 99/24578 



-424- 



PCT/IB98/01665 



370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
5 raising antibodies. 



Example 90 

The following DNA sequence, believed to be complete, was identified in ^meningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

2 51 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 
301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

3 51 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFK LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
20 51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 

101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

3 51 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
35 .101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP 1484 hypothetical integral membrane protein of//, pylori (accession number AE000647) 
ORF 102 and HP 1484 show 33% aa identity in 143aa overlap: 



orf 102 


3 


FSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGN PEYVRLSGMAVRLYRFMS PLGF 


62 






F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 




HP1484 


8 


FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK--KLYSFIASPAM 


65 


orf 102 


63 


GAWFGAAI PFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 


119 






G + + + GW+H KL L++LLAY YC +R + + R+Y 




HP1484 


66 


GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 


125 


orf 102 


120 


RVFNEIPXXXXXXXXXXXXFKPF 142 








RVFNE P KPF 




HP1484 


126 


RVFNEAPTILMILIVILVWKPF 148 





10 
15 



25 
30 
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10 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A of A'. 
meningitidis: 

10 20 30 40 50 60 

orfl02 pep mmfswfklfhlffviswfaglfylprifvnmamidvprgnpeyvrlsgmavrlyrfmspl 

I I M I I M I I II M I I I I M I I I I I I i I I I I I I I I I I I I I I I I II I I I I I I I I I I 

or f 102a mmfswfklfhlffviswfaglfylprifvnmamidvprgnpeyvrlsgmavrlyrfmspl 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDySNAFSHRWYR 

M I I I I I I M I I I I I I II I M I I 1 I I I I I M I I I I II I 11 M I I I I I I I I I I I I I 

orf 102a GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
15 " 70 80 90 100 110 120 

130 140 
orf 102. pep VFNE I PVLLMVAALYX W FK P FX 
I I I I I I I I II I I I I I I I I I I I I 
20 orfl02a V FNE I P VLLMV AAL Y LW FK P FX 

130 140 

The complete length ORF 102a nucleotide sequence <SEQ ID 761 > is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

25 101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

2 51 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

30 351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
35 101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102a and ORF102-1 show complete identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102a . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I | I i | i 1 I I I I I II I I I I I I M I I II M I I I I II i M i I I I I I I II II if I I I I I I I I I I 
40 orf 102-1 MM FSWFKLFH LFFV I SW FAG LFYL PR I FVNMAM I DVPRGN PEYVRLSGMA VRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102a . pep G FG A W FGAA I PFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDY SNAFSHRWYR 

45 * * II I I I I I I I I M M I I II I I I I 11 I I M I I II I I I I I I I I I I I I I M I I II I I 

orf 102-1 GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100- 110 120 

130 140 
50 orf 102a. uep VFNE I PVLLMVAALYLWFKP FX 

I I I I I I I M I II I I I I I I I I I I I 
orf 102-1 VFNE I PVLLMVAALYLWFKP FX 

130 140 

55 Homology with a predicted ORF from N. gonorrhoeae 

ORF 102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from TV. 
gonorrhoeae: 
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MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 60 
| M | M | | M | M | | I I I I H I I I I II I M I I I I I : I I M i 11 I I I I I II I I I I I I I I M 
MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 60 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

| I I | I M I I M I M M I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I II M I I I 
GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

VFNEIPVLLMVAALYXWFKPF 142 
I I I I II I I I I I I I I I I I I I I I 
VFNE I PVLLMVAALYLWFKPF 142 

The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGCGCC GCGCGGCAAT CCCGAGTATG TGCGCCTGTC GGGGATGGCG 

151 GTGCGGTTGT ACCGTTTTAT GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

2 51 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTATCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAAcg aAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA I PFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVL LRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK. P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1 .peD MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I II I I I I I I I M I I I 1 I I I I i I I II I I I I I I i I I : t I M I I I I I I I I M I I I I I I I I I I 
orf 102ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102-1 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I II I I I II I I I I 1 I I I I I I I I I I I II I I I I M I I I I M II I I I I I M I I I I I II I II I 
orf 102ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102-1 .pep V FNE I P VLLMVAAL Y L W FK P FX 
I I I M II II 11 I I I I I I I I I I II 
orf!02ng VFNEI PVLLMVAALYLWFKPFX 

130 140 

In addition, ORF102ng shows significant homology to a membrane protein from H. pylori: 

gi 1 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 148 
Score = 79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps = 13/147 (8%) 

Query: 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

Sbjct: 8 FLWVKAFH V I AV I SWMAALFYLPRLFVYHAENAHKKE FVGWQI QEK — KL YS FI AS PAM 65 

Query: 63 GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 115 

G + + F +G GW+H KL L ++LLAY YC +R + + 
Sbjct: 66 GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYH FYCKKCMRELEKDPTRRN 121 

Query: 116 HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 

R+YRVFNE P KPF 
Sbjct: 122 ARFYRVFNEAPTILMILIVILVWKPF 14 8 



orfl02.pep 
orfl02ng 
orf 102 .pep 
orf 102ng 
orf 102. pep 
orf 102ng 
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Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 765>: 

5 1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

10 101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GAT TAT T C CG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT AT GAG AG AC A GTATGAATAC 

15 351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

4 01 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

4 51 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAA AVWG GWS.LKPEPH VLDITETVRR G 

20 51 

101 

151 

201 I SFriLSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

25 301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 

1 ..GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 

30 101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

2 51 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG AC AG AG C AAA ATTTCCATCA ATACCGCCGA 

35 351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

4 01 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

4 51 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

501 GAT TGCCG AG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

40 601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 

651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

45 851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

50 1 . . VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR .QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

55 251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 



BNSDOCID: <WO 9924576A2J_> 
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10 



15 



20 



25 



30 



Homology with a predicted ORP from N. meningitidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity- over a 153aa overlap with 
an ORF (ORF85a) from strain A ofN. meningitidis: 

10 20 30 40 

orf 85 .pep MAKMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 
I \ I I II I I I II I II I I I II I I i ! MM!:: I I i I I I i I 
orf 8 5a MAKMMKWAAVAAVAAAAWGGWS YLKPEPQAAYITETVRRGDI SRTVSATGE I S PSNLVS 

10 20 30 40 50 60 

// 

80 90 100 

orf 8 5. pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I I I I II I I I I I I ill I I I I I I i i I I 1 I 
orf 85a T I VQLAN LDMMLNKMQ I AEG D I TKVKAGQD ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

210 220 230 240 250 260 

110 120 130 140 150 160 

orf 8 5 . pep G YN S S T DT ASN AV Y Y YARS FVPN P DGKLATGMTTQNT VE I DG VKN VL HPS LT VKNRGGK 
I I I I I M I I I I I I I I I I II I I I II M It t I I I I I I I I i I I I I I I M II I I I I I I I I I I I : 
orf 85a GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGR 
270 280 290 300 310 320 

170 180 190 200 210 220 

orf 85 . pep AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
I I I I ! I M I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I II II I I I I I II I i I I I I I 
orf 8 5a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
330 340 350 360 370 380 

230 

orf 8 5. pep PRRX 
MM 

orf85a PRRX 
390 



The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 



35 



40 



45 



50 



55 



1 


ATGGCAAAAA 


51 


GGTTTGGGGC 


101 


TTACGGAAAC 


151 


GGGGAGATTT 


201 


GCAGATTAAG 


251 


ATTTGATTGC 


301 


G AAAAAT CCA 


351 


TGCATTGGGC 


401 


AG GAT GAT GC 


451 


GCCGCCGCCA 


501 


CAAAATTTCC 


551 


CCGCAACGAT 


601 


ACTGTGAACG 


651 


GGATATGATG 


701 


TGAAGGCGGG 


751 


CCGATTAAGG 


801 


GTCGGGCGGC 


851 


ATTATGCCCG 


901 


ATGACGACGC 


951 


TATTCCGTCG 


1001 


TGTTGGGTGC 


1051 


AGAGACAGTA 


1101 


AGTGGTCATC 


1151 


GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
GGTCAGGCGC 
CGCCGTCCAA 
AAACTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCGCTAAA 
AAGCCAATGT 
AT CAAT AC C G 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
AGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGACATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCTCGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGCTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCGGCGGAAC 
AGTAAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAGCCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTT 
AG AC CAAT AC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCACA 
AAGGCTCTAA 
ATTGGGCTAC 
TTCTCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAGGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCAGC 
GCTGCTTATA 
TTCTGCAACA 
AGGCATCGGG 
AAAAAGGGCG 
GCTCAATACG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
T C AG AC AG AG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTACT 
CGCCACGGGG 
ATGTGCTGAT 
TTTGTGCGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 770>: 



60 



65 



i 

51 
101 
151 
201 
251 
301 



MAKMMKWAAV AAVAAAAVWG 



GEISPSNLVS 
EKSKLETYQA 
AAAKANVAEL 
TVNAAQSTPT 
PIKAKLDSVD 
MTTQNTVEID 



VGAQASGQIK 
KLVSAQIALG 
KALIRQSKIS 
IVQLANLDMM 
PGLTTMSSGG 
GVKNVLIIPS 



GWSYLKPEPQ 
KLYVKLGQQV 
SAEKKYKRQA 
INTAESELGY 
LNKMQIAEGD 
YNSSTDTASN 
LTVKNRGGRA 



AAYITETVRR 
KKGDLIAEIN 
ALWKDDATAK 
TRITATMDGT 
ITKVKAGQDI 
AVYYYARSFV 
FVRVLGADGK 



GDISRTVSAT 
STSQTNTLNT 
EDLESAQDAL 
WAILVEEGQ 
SFTILSEPDT 
PNPDGKLATG 
AAEREIRTGM 



BNSDOCID. <WO 9924578A2_I_> 
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-429- 
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351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



orf 85a .pep 
orf85-l 

orf85a.pep 
orf85-l 

orf 85a . pep 
orf85-l 

orf 85a. pep 
orf85-l 

orf 85a. pep 
orf85-l 

orf 85a. pep 
orf85-l 



30 40 50 60 70 80 

PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I II I II I M I I I I II 11 I M 11 

VSVGAQASGQIKILYVKLGQQVKKGDLIAE 
10 20 30 

90 100 HO 120 130 140 

INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

I I I I I M I I I I I I I M I I I I 1 I I I I M I I I I M I I I I I I I :: I I : I I I I I I I I I 

INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 

150 160 170 180 190 200 

ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTVVAILVEEGQTVNAAQST 
| : M I II I II I II I I 1 I M I I I I I I II I i I I I I I II I I I I I I I I I I I I I I M I I I I M I I 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 



210 220 230 240 250 260 

PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I M I I M I I M I I I I M I I I I I I I M I I I I I I I I I I I I I I I I II M I I I I II II I I I II 
PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

GG YNS ST DTASNAVYYYARSFVPNPDGKLATGMTTQNTVE I DGVKNVLI I PSLTVKNRGG 
M I I II I I II II I I I M I I I I I M I M I I I M I I M I H I I 1 I I I M I I I I I I I I I M M 
GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVE I DGVKNVLI I PSLTVKNRGG 
220 230 240 250 260 270 

330 340 350 360 370 380 

RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
: I I | I I M I I I I I I I I I M 1 I I I I I M I I M I I i M 1 I I I I I I I I I I I I I M I I I I M I I 
KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



390 
PPRRX 
I I I II 
PPRRX 



orf 85a . pep 
orf85-l 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a. 



Homology with a predicted ORF from N. gonorrhoeae 
45 ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 



50 



55 



60 



65 



ORF85 
ORF8 5ng 

0RF85 
ORF85ng 
ORF8 5 
ORF85ng 
0RF85 
0RF8 5ng 
ORF85 
ORF85ng 



1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 4 0 

I M | M I II I I I I II I I II I I I I INI!:: I I I : I M I 
1 MAKMMKW AAV AAVAAAA VW GG W S Y LK PE PQAAY I T E AVRRG D I S RT V S AT 50 



201 



: . ISFTILSEPDT 

I I I I I I I I I I I 

T VN AAQ S T PT I VQLAN L DMMLNKMQ I AEG D I T KVKAG Q D I S FT I L SE P DT 



2 51 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 
I I I I II I I I I I I I I I I I I M M II M I I I I I I I M I I I I I M I I I I I I I I 
251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 



250 



250 



300 



300 



301 MTTQNTVEIDGVKNVLIIPSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

I 1 I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I M 1 M I I I I I I I I I 
301 MTTQNTVEIDGVKNVLLIPSLTVKNRGGKAFVRVLGADGKAVEREIRTGM 350 

152 RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 

: II 11 I I I I I I I I I I I I I I I I I III I I I M I I I I II I I I II I 
351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 



BNSDOCID: <WO 9924578A2J_> 



WO 99/24578 



-430- 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771> is: 



10 



15 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGGCAAAAA 
GGTTTGGGGC 
TTACGGAaac 
GgcgAGATTT 
GCAGATTAAA 
ATTTGATTGC 
GAAAAATCCA 
TGCATTGGGC 
AGGATGATGC 
GCCGCCGCCA 
CAAAATTTCC 
CCGCGACGAT 
ACTGTGAACG 
GGATATGATG 
TGAAGGCGGG 
CCGATTAAGG 
GTCGGGCGGC 
ATTATGCCCG 
ATGACGACGC 
TATTCCGTCG 
TGTTGGGTGC 
AAAGACAGTA 
AGTGGTCATC 
GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
ggTCAGGCGC 
CGCCGTCCAA 
AAGCTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCTCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
GGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGATATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCACGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGTTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCAGTGGAAC 
AGTGAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAACCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTC 
AGACCAACAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCGCA 
AAGGCTTTAA 
TTTGGGCTAC 
TTCCCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAAGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCaac 
GCTGCTTATA 
TTCCGCGACG 
AGGCTTCGGG 
AAAAAGGGCG 
GAT C GAT AT G 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TC AG AC AG AG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTATT 
CGCCACGGGG 
ATGTGTTGCT 
TTCGTACGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 772>: 



1 MAKMMKWAAV AAVAAAAWJG 



30 



51 GEISPSNLVS 

101 EKSKLETYQA 

151 AAAKANVAEL 

201 TVNAAQSTPT 

251 PIKAKLDSVD 

301 MTTQNTVEID 

3 51 KDSMNTEVKS 



VGAQASGQIK 
KLVSAQIALG 
KALIRQSKIS 
IVQLANLDMM 
PGLTTMSSGG 
GVKNVLLIPS 
GLKEGDKWI 



GWSYLKPEPQ 
KLYVKLGQQV 
SAEKKYKRQA 
INTAESDLGY 
LNKMQIAEGD 
YNSSTDTASN 
LTVKNRGGKA 
SEITAAEQQE 



AA Y I T E A VRR_ 
KKGDLIAEIN 
ALWKDDATSK 
TRITATMDGT 
ITKVKAGQDI 
AVYYYARSFV 
FVRVLGADGK 
SGERALGGPP 



GDISRTVSAT 
STTQTNTIDM 
EDLESAQDAL 
WAIPVEEGQ 
SFTILSEPDT 
PNPDGKLATG 
AVEREIRTGM 
RR* 



35 ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 
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30 40 50 60 70 80 

orf85nq PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

II M I I II I I II 1 I I II I I I I M I I I I I I 
o r f 8 5 - 1 VS VGAQASGQ I K I LYVKLGQQVKKGDLI AE 

10 20 30 

90 100 110 120 130 140 

orf 85ng INSTTQTNTIDMEKSKLETYQAKLVSAQ1ALGSAEKKYKRQAALWKDDATSKEDLESAQD 

I I I I : I I I i : : I I I I II 1 I I I II II I II I II I I I I I I I I I I 1 I I I :: I I I M I M II M 
orf 8 5-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 

150 160 170 180 190 200 

orf 85ng ALAAAKANVAELKALIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 
I : I I I I I M I I I I I I I I M I I I I I I I II : I II I I I I I I I II I I I I I I II II I I I I II M 
orf 85-1 AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

100 110 120 130 140 150 

210 220 230 240 250 260 

orf85ng PT I VQLANLDMMLNKMQIAEGD ITKVKAGQDI S FT I LSEPDTPIKAKLDSVDPGLTTMSS 

I I I I M I I I I I M I I I I I M I 1 II I I I M I II 1 I I I M I I I I I I 1 I I I I I I I I I I I I I II 
orf 8 5-1 PT I VQLANLDMMLNKMQIAEGD I TKVKAGQD IS FT I LSEPDTPIKAKLDSVDPGLTTMSS 

160 170 180 190 200 210 

270 280 290 300 310 320 

orf 85ng GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 

I I I I I I I II II I I I I I I I I I I I I I I M M I I I I M I I I I I I I I : I II I I I I I I I I 

orf 8 5-1 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
220 230 240 250 260 270 



330 



340 



350 



360 



370 



380 
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orf85ng KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
| | I 1 I I) I I I I I I : j I I I I I I I : I I I I I I I M I II I I I I M I I I I I M M I M I I I I M 1 
O-f 85-1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 

5 

390 

orf85ng PPRRX 
Mill 

orf85-l PPRRX 

1 0 In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

qi 1 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli) Length = 380 
Score = 193 bits (485), Expect = 2e-48 
15 identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 



20 



Query: 2 9 PQAAYITETVRRGDI SRTVSATGE I S PSNLVS VGAQASGQIKKLYVKLGQQVKKGDLI AE 88 

P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

SbjCt: 41 PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

Query: 8 9 INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 14 8 

I-h N I ++ L +A+ A+ L A Y RQ L + A S++ 

Sbjct: 101 IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 



25 Query: 14 9 XXXXXXXXXXXXXXXIRQSKI S INTAESDLGYTRITATMDGTWAI PVEEGQTVNAAQST 208 

I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 
Sbjct: 161 EMAVKQAQIGTIDAQIKRNQASLDTAKTNLDYTRIVAPMAGEVTQITTLQGQTVIAAQQA 220 

Query: 209 PTIVQI^ANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS ,268 
30 P 1+ LA++ ML K Q++E D+ +K GQ FT+L + P T + ++ VP 

Sbjct: 221 PNILTLADMSAMLVKAQVSEADVIHLKPGQKAWFTVLGDPLTRYEGQIKDVLP 273 

Query: 269 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 328 
+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
35 Sbjct: 274 TPEKVNDAIFYYARFEVPNPNGLLRLDMTAQVHIQLTDVKNVLTIPLSALGDPVG 328 

Query: 32 9 KAFVRV-LGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISE 372 

+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 
Sbjct: 329 DNRYKVKLLRNGETREREVTIGARNDTDVEIVKGLEAGDEWIGE 373 

40 Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 92 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 773>: 

50 1 ..ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC G AG AG C AAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

4 01 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

4 51 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTG CAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; ORP120>: 



1 . . IPA7MTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC G AC GAT T AAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAAT CGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PA IWTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF120 shows 92.4% identity over a 1 84aa overlap with an ORF (ORF120a) from strain A of N. 
meningitidis: 

10 20 30 

or f 120, pep I PATMTFERSGNAYKIVST I KVPLYNIRFE 

till : I I I I I I I I 1 II I I I I I I i 

orfl20a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 



40 50 60 70 80 90 

orf 120. pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 
I I I I I II II I I I i I I I I M I I I \ I I 1 M I I I I I 11 1 I I I : I I I I I I i I I I I It I I 

orf 120a SGGT WGNTLH PT YYRD IRRGKLYAE AKFADGS VT YGKAXXXXXXQS PKAMDLFTLAWQL 

70 80 90 100 110 120 



100 110 120 130 140 150 

orf 120. pep AAN DAKL P PG LK I TNGKKL Y S VGG LNKAGTGKY S I GGVETEWKYRVRRG DDAVMYFFAP 
I I I I I I I I I I I I I I I I I II I M I I I I I I M I I I I I I I I M I I I I II M M I I I I I I I I I I 
' orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 . 160 170 180 
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160 1*70 180 

orf 120 . pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
M I I 1 I I I I I I I M I I I I I I I M M I I I I II II M 
orf 120a SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

The complete length ORF 120a nucleotide sequence <SEQ ID 777> is: 



10 



15 



20 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTNCA 
TTTCGAGTCC 
ATAGAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGTT 
GGAAACCGAA 
TGTATTTCTT 
ACCGACGACG 
CGGCCAGGCA 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGGCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAGGGCTGCC 
CCCGCCACNA 
G AC GAT T AAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGNNNNN 
CGCTTGCNTG 
AAAATCACCA 
GGGTACAGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAA 
AA 



GCCGCCATTT 
CNAATCCGCC 
NNANNTNNGN 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
ANCNNNNNNG 
GCAGTTGGCG 
ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACNNNGNGNC 
ACAATATCCG 
CCTACCTACT 
ATTCGCCGAC 
NGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TAGGCGGCGT 
GATGCGGTAA 
AATCGGCTAT 
TGCAGATCAA 
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30 
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40 



45 



50 



55 



60 



This encodes a protein having amino acid sequence <SEQ ID 778>: 

1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120a . peo MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK- 

I I I I I I M I I I I I I I I 1 I I I M I I I I I I I 1 II II I I I M M I : II I i I M If 

orf 120-1 MMKT FKNI FSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 120a . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
I I I i I I I I I I II I I I M I I I I I I II I I I I I I I I I M I I I I I II I I I I I : MINI 

orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120a . pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
I I I I II I I I I I I I 11 I I II I I 11 II M I I II I I I M I I I I I I I I I I M I I M I I II I I I I 
orf 120-1 DL FT LAWQLAAN DAKL PPGLK I TNGKKLYSVGGLNKAGTGKYS I GGVETE WKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

orf 120a. pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I M I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I M I I II I I 
orf 120-1 DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 

Homology with a predicted ORF from N gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 

orf 12 0. pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

j I I I I I I I I II I I I I I I I I II I I I I I I I I I 
orfl20ng SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIKVPLYNIRFE 69 

orf 120 . pep SGGT WGNT LH PT YYRD I RRGKLYAEAKFADGS VT YGKAGE SKTEQS PKAMDLFTLAWQL 90 

| | | | M I I I I I I : M : I I I I I I I I I I I I I I I I I I 11 I II M I I I I M I I I I I I I I I I 1 I I 
orf 120ng SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 129 



BNSDOCID: <WO 992457BA2J_> 



WO 99/24578 



-434- 



PCT/IB98/01665 



orfl20.pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

I I M I I I I M I I I I M I I I I I i I I I M I I I I I I M I I I I I I I II II I I I I I I : I Mill 
orfl20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 18 9 



orf 120. pep 
orf 120ng 



SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

I I I I I I I I II I I I I I I I I M I 1 I M I I I I I I I I I 
SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 



The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 
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15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTACA 
TTTCGAATCC 
ATAAAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGCC 
GGAAACCGAA 
CGTATTTCTT 
ACCGACGACG 
CGGACAGGCC 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGTCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAAGGCTACC 
CCCGCCACGA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGGCGAG 
CGCTTGCCTG 
AAAATCACCA 
GGGTACGGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAG 
AA 



GCCGCCATTT 
CCAATCCGCC 
TGACATTTGA 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
AGCAAAACCG 
GCAGTTGGCG 
ACGGCAAAAA 
AAAT AC AG C A 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACGCAGCGGC 
ACAATATCCG 
CCTGCCTACT 
ATTCGCCGAC 
AGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TaggCGGCGT 
GATACGGTAA 
AATCGGCTAT 
TGCAGATCAA 
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This encodes a protein having amino acid sequence <SEQ ID 780>: 

i MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKI VSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

30 In comparison with ORF120-1 , ORF120ng shows 97.8% identity in 223 aa overlap: 



35 



40 



45 



50 



orf 120-1 .pep 
orf 120ng 



orf 120-1 .pep 
orf 120ng 



orf 120-1 .pep 
orf 120ng 

orf 120-1 .pep 
orf 120ng 



10 20 30 40 50 60 

MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
I I I I I I M I I I II I I I I I I I I I I I i I I I I I M I I I I 1 I I I I I I M II I I I M I M I I I I 
MMKT FKN I FS AAI LSAALPCAYAARLPQSAVLHYSGS YG I PATMT FERSGNAYKI VST IK 

10 20 30 40 50 60 

70 80 90 100 110 120 

VPLYN1RFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

I II I I I M I I I I I I I II I I I I : I I : I I I I I I I I M I I I I I I M M I I I I I I M I I I I I I I 
VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
70 80 90 100 110 120 

130 140 150 160 170 180 

DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
I I I M I I I I I II I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 1 I I 
DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
1:1 I I I I I II I I II I 1 I I ! I II II I I I I I I I I I I II I II I I I I 
DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 



55 



This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 78 1>: 
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10 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGTATCGGA 
. GCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGAGA 
AGGCAGGGCG 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATT , 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 



GTGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATCGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
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This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 
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25 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGAGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
AATGCTGATT 
. TGGATTCGGG 
CCTTATCTCG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAgG 
ACGAGGTATT 
ATGGGCTTGG 
GTTTGCCATC 
GGGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GGCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGTTT 
GGTATGCTTG 
GGGATTGCTG 
ACGGCATCCT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 
G 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGATTGGTG 
CCGGTATTTT 
CTTGCCACCG 
ATCGGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCGGGATTG 
TGCAGAAATA 



GTGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATCGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTGTTTGTC 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



40 



45 



This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 MYRRKGRGI K PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

NNLASRLPQL 
ALKAWFPVLM 
RFAGAYTRIT 
GMLAG ILVFV 
SFFITPKIVG 
REGVQKYFAG 



51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE 

301 DRIGLSPFWV IFSLMAFGQL MG F VGMLAG L PLAAVTLVLL 

351 SFYRGR* 



50 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of N. 
meningitidis: 



55 



orf 121 . pep 



orf 121a 



orf 121 .pep 



10 20 30 40 50 60 

MYRRKGRG I KPWMGAGXAFAALVWLVFALG DTLTPFAVAA VIA YVLDPLVEWLQKKGLNR 

I I M I 1 I I I I I M II I I I I I I I II I II I I I I I I II I I I I I I I I II I II II I II I I II I 
MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 



BNSDOCID: <WO 9924S78A2_I_> 



WO 99/24578 



-436- 



PCT/IB98/01665 



10 



I ! t I I I 1 I I I I I I 1 I I I I I I M I I M I 1 I I I I I I I I I M I M I I I I i I I I I t I I I I t I I I 
orf 121a ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 

orf 121 .pep E IDQAS I IAWLQAHTGELSNALKAWFPVLMRQGGN I 
I I I I I I M I I II i I I I I I I I I I I I 1 I I I I I I I M I I 
orf 12 la E I DQAS 1 I AWLQAHTGEL SNALKAWFPVLMRQGGN I VS S I GNLLLL PLLLYY FLLDWQRW 

130 140 150 160 170 180 

orf 12 la SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence <SEQ ID 785> is: 



15 



20 



25 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGTTTGG 
GTTTGCAATC 
GCGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGCTT 
GGTATGGTTG 
AGGACTGCTG 
ACGGCATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 
G 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGGTTGGTG 
CCGGTATTTT 
CTGGCAACCG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
TGCAGAAATA 



ATGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTTTTTGTT 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



This encodes a protein having amino acid sequence <SEQ ID 786>: 



40 



l 

51 
101 
151 
201 
251 
301 
351 



MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

RQGGNIVS SI GNLLLLPLLL YY FLL DWQRW SCGIAKLVPR RFAGAYTRIT 

GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAG ILVFV 

PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF F1TPKIVG 

DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL P LAAVT LVLL REGVQKYFAG 
SFYRGR* 



45 ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 



50 



10 20 30 40 50 60 

orf 121a . pep MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I I II I 1 I I I I I 1 I II I 11 ! II I i I I I I I I I I II I I I II I I I 1 ! It I I I II I I I M I 
orf 121-1 MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 ' 50 60 



55 



70 80 90 100 110 120 

orf 121a. pep ASASMSVWFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I I I I I I I I I M I I ! I I ! I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I 1 I I I 11 11 I I 
orf 121-1 AS ASMS VMVFS L I LLLALLL 1 1 V PMLVGQFNNLAS RLPQL IG FMQNTLLPWLKNT I GGYV 

'70 80 90 100 110 120 



60 



130 140 150 160 170 180 

orf 121a . pep EI DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGN I VSSIGNLLLLPLLLYYFLLDWQRW 
I I II I \ I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I 
orf 121-1 EI DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGN I VSSIGNLLLLPLLLYYFLLDWQRW 

130 . . 140 150 160 170 180 



65 



190 200 210 220 230 240 

orf 12 la . pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 



PCT/IB98/01665 



-437- 



10 



15 



orfl21-l 

orfl21a.pep 
orf!21-l 

orf 121a. pep 
orfl21-l 



I ] | | | M I I I I I I ! I I M I I I M II I I I I I ! I M I I I II I I I M I I I I I I M I I I M I! I 
SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
| I : | | | I I I I M I II I M I I I M I I I I 1 I I M M I I I : I I I I I I 1 I I I I I I I I I I I I i I I 
GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 

I | | I I I M M I I I I I I I 1 M I I I I M I I I I I I i I I I I I 1 M I I M I I I I I 1 I I I M I 
DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
310 320 330 340 350 



Homology with a predicted ORF from N. gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 



20 



25 



30 



N. gonorrhoeae: 

orf 121 .pep 
orf 121ng 
orf 121. pep 
orf 121ng 
orf 121. pep 
orf 121ng 



MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

| | || | | | | | I I I I I I I I I I I I II I I : I I I M I I I I I M I I I M I I I I I I I I I I I I I M I 
MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

I || | | | ! I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I M I I I I I I I I I I I I I I 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 



E I DQAS 1 I AWLQAHTGELSNALKAWFPVLMRQGGN I 

I I I I I I M I I : I I I I I I I I I M I I I I I I I I : I M I I 
eidqasiiawfqahtgelsnalkawfpvlmkqggnivstignlllpp: 



60 



60 



120 



120 



156 



LLLYYFLLDWHRW 180 



An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 



acid sequence <SEQ ID 788>: 



35 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR 

251 GGG* 



FQAHTGELSN 
SCG I PKLVPR 
ECWEGGGARS 



NNLASRLPQL 
ALKAWFPVLM 
RFAGAYTRIT 
RPSDDGWPRW 



Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 
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45 



50 



55 



60 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
GTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AAACAGGGCG 
CTTGCTGCTT 
TCGCCAAACT 
GGTAATTT G A 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTCGGTCA 
GACCGTATCG 
CGGAGAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GAAAAGGACG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGCTTGG 
ATTTGCCATC 
GTGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTA 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AATAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGTTT 
GGTATGGTTG 
GGGATTGCTG 
ACGGAATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 
G 



CCGTGGATGG 
CGCGCTCGGC 
ATGTGTTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCTCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGTC 
GG G ATT GAT G 
CCGGTATTTT 
CTTGCCACTG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
CGCAGAAATA 



GTGCCGGCGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
TTTCAGGCGC 
CGTTTTGATG 
TGCTGCCGCC 
TCGTGCGGCA 
GCGCATTACG 
AGCTTCTGGT 
CTAGTCGGAC 
GGTGTTTGTC 
TTGCAGCCTT 
GCGGTTTTTG 
AATTGTAGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



BNSDOCID: <WO 992457BA2_I_> 



WO 99/24578 



-438- 



PCT/IB98/01665 



This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 



10 ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 



15 



20 



orf 121-1 .pep 
orfl21ng-l 



orf 121-1 .pep 
orf!21ng-l 



10 20 30 40 50 60 

MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

I I I M I I I I I I I I I I I M I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
M YRRKGRG I K P WMGAG AAFAAL VWL VYALG DTLT P FAVAAVLA YVL D PLVE WLQKKGLNR 
10 20 30 40 50 60 

70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNIASRLPQLIGFMQNTLLPWLKNTIGGYV 
I I I M I I I I I I I M I I I I I I I I t I II I I I I I I I I M I I I I I I I 1 I I I I I I I I I I I I I I I I 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



25 



orf 121-1 . pep 
orfl21ng-l 



130 140 150 160 170 180 

EI DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI VSS IGNLLLLPLLLYYFLLDWQRW 
I I I I I I I I I I : I I I I I I I I I I I I I I I 1 II I : I M I I I I I I I II I I I I I! I I II I II M I 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 



30 



35 



40 



45 



orf 121-1 . pep 
orfl21ng-l 



orf 121-1 .pep 
orf 121ng-l 



orf 121-1 .pep 
orf 121ng-l 



190 200 210 220 230 240 

SCG I AKLVPRRFAGAYTR I TGNLNEVLGEFLRGQLLVML I MGLVYGLGLVLVGLDSGFAI 
I I I I II II II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II M I M I I : I I I I I I I I I I 
SCG I AKLVPRRFAGAYTR I TGNLNEVLGEFLRGQLLVML I MGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

GMLAG I LVFVPYLGAFTGLLLATVAALLQFGS WNG I LSVWAVFAVGQFLES FFIT PKI VG 
I I : I I I I I I I I I I I I I I I I I I I M I I I I I I I I II II I : I I I I I I I I I I I 11 I II I M I I I 
GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I II I I I I I I I II I I I I I : I I I I I I I I I I I II I M M I I I I I I : I I M I I I II I 1 I I 
DRIGLSPFWV I FSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQFTY FAG SFYRGRX 
.310 320 330 340 350 



50 



55 



60 



65 



In addition, ORF121ng-l shows homology to a permease from H. influenzae: 

sp|P4 3969|PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score = 69.9 bits (168), Expect = 2e-ll 

Identities - 67/317 (21%), Positives = 120/317 (37%), Gaps - 7/317 (2%) 

Query: 2 6 VYALG DTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct : 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAHTGELSNALK 14 3 
ML Q +L S LP + N WL N Y E ID + + + F + ++ + 

92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 14 7 



Sbjct : 
Query : 



14 4 AWFPVLMKQGGNIVSSIGNXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGNL 203 



+ + + N+VS D G+++ +P+ A+ R + 

Sbjct: 148 SAVKLSLASIMNLVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 

Query: 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 



BNSDOCID: <WO 9924578A2_L> 



WO 99/24578 PCT/IB98/01665 

-439- 

Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ + P + + + LP +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYIIIAFAVSQLLDGNLLVPYLFSEAVNLHPLIIIISVLIFGGLWGF 326 

5 Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ + PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 34 3 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
10 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 94 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 79 1>: 

1 . . ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT C CAT CAT G AC TCGTCATATT 

15 51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

20 301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

4 01 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

4 51 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TAT C AG . . 

25 This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 

1 . . TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

30 Further work revealed the complete nucleotide sequence <SEQ ID 793>: 

1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA' GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

35 201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

40 4 51 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

45 701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

7 51 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

50 101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* ' 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N. meningitidis (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of K 
meningitidis: 

10 20 30 

orf 122 . pep TAFSAALRLSPSXLVI FLSFGKPYQQTAAI 

I I M M : I I I I : I I I I I I I I I I M I II I 
orf 122a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWI FLSFGKPYQQTAAI 

30 40 50 60 70 80 

40 50 60 70 80 90 

orf 122 . peD LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
I I I I I I I I I It I I I I I i I I I II I I I II I : I I I I I I I I II I 1 I I \ I I I I I I I I I I I I 
orf 122a LTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAFXVDARNVYAQIGGDVGTHLR 

90 100 110 120 130 140 



100 110 120 130 140 150 

orf 122 . pep N VRRE CG FL CN HGR I D I DRL PT LR LN AL I RRTQKDAAVR I FE L CGG VGEMAAD I AQT CRT 
I : I I 1 I I I I II I I I I I II I II I I I ! I I I I I I I I M I I I I I I I I I I II .1 I II I I I I I I I I 
orf 122a NMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 



160 170 180 

orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
I I I I I II I I I I I M I I I M I II I II I I I I I I I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 
210 220 230 240 250 

The complete length ORF122a nucleotide sequence <SEQ ED 795> is: 



1 AT AT CAT ATT GGGCAAGCAG TTCACTGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAACC GGTACCGATG CCGATGTATT CGTTTTCGGG TACGAATTCG 

151 ACTGCNTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

2 01 TTTGTCCTTT GGG AAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTNNNACGTC CTGCCCGCCG CGTTCAAATC CTTACCAGCA ATACCGCCGC 

301 CTGCGACTCT ATGCCTTCCA TGCGCCCGAG ATAACCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GANGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

4 01 ATGTTGGCAC GCATTTGCGG AATATGCGGC GCGAGTTTGG GTTTCTGTGC 

4 51 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

7 01 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

7 51 CGTCATCGTT TGTGTTCCTG A 

This encodes a protein having amino acid sequence <SEQ ED 796>: 



1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVE PVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



10 20 30 40 50 60 

orf 122a . pep ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
MINIM M I II II M II M M M M II II M I I II I II : M M II i I I I M I II M I 
orf 122-1 ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVE PVPMPIYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 



70 80 .90 100 110 120 

orf 122a. pep SSCWI FLSFGKPYQQTAAI LTFFXTSCPPRSNPYQQYRRLRLYAFHAPE ITEFFVGFAF 
I M II M I II M M I I M I I I I I I I M M M I I M I I 1 I I II II I M I : I I I II I M 
orf 122-1 SSCWI FLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
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WO 99/24578 
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130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
I I I I ! I I I I I I I I I I I I M ! : I I I I I I I I I I II I I I I I I II M I I I M I I I M I M I M 
orf 122-1 DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122a . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I I I I I i I I I I I I I 11 I I I I I I I I II I I I I ! I I I I I 1 I I II I I I I II I M I II I I I I I I I I 
orf 122-1 FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122a. pep D I VAL S DT D VRHRLC S X 
I I II i I I I 1 I I I I I i I I 
orf 122-1 DIVALSDTDVRHRLCSX 

250 

Homoloev with a predicted ORF from ~N .gonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N. gonorrhoeae: 



25 



30 



35 



orf!22.pep 
orf 122ng 
orf 122 .pep 
orfl22ng 
orf 122 .pep 
orf 122ng 
orfl22 .pep 
orf 122ng 



TAFS AALRLS PSXLVI FLS FGKP YQQTAAI 3 0 
I I I I I I : M I I : II I I II II II I II M I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 80 

LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 90 
M I II II I I I I I I I I I I M I I I I I I I I I I I I I M I M I I : I I I I : : I I II I I II I I I 
LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 14 0 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 
II! I I I I I I M 1 I I I I : I I I I I I I I I I I I I I I I I II I I I M I I II I : I I I I : I I II I I 
NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
11111111111:11 : I I I II I II II I II I I 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtCCttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 



60 



1 MSYRASSSPD 

51 T AFSAAMRLS 

101 LRLYAFHPPE 

151 NHGRIDIDHL 

201 EQRVGNGVQQ 

251 RHRLCS* 



FLEVETAPLI 
SSCWIFLSF 



IAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 



BNSDOCID: <WO 9924578A2_I_> 



V 



WO 99/24578 



PCT/IB98/01665 



-442- 



ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 



10 



15 



20 



25 



30 



orf 122-1 -pep 
orf 122ng 



orf 122-1 .pep 
orf 122ng 



orf 122-1 .pep 
orfl22ng 

orf 122-1 .pep 
orf 122ng 



orf 122-1 .pep 
orf 122ng 



10 20 30 40 50 60 

ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
: I I I M I I t ! M I : I I It I I I I I I I M M I I I I I I I 1 I I I : I M I I I I M I I I I I I I M 
MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

I | | M I I I I I I I I I M I I I I I I I If I I i I I I I I i M I I I I I i I I I I I I i ! 1 I I 1 I I I I 
SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 
70 80 90 100 110 120 

130 140 150 160 170 180 

DVDARN V YAQ I GG D VGTH LRNVRRE FG FLCNHGRI D I DRLPT LRLNAL I RRTQKDAAVR I 
I : | I I I : : I I M I I i I II I II I I I I I I I I i I I I I I I : I I I I I I I I I I I I I I I I I I I I 1 
DIDARNIDTQIGGDVGTHLRNVRCEFG FLCNHGRI DIDHLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

I I I I I I I I : M I I: I I I I I I 1 I I I I I I M I I : i I : I I II I I I I ! I I I I I i I I I I I I I I 
FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 
190 200 210 220 230 240 

250 

DIVALSDTDVRHRLCSX 
I I I M I I I I : I M I It I 
DIVALSDTDIRHRLCSX 
250 



Based on this analysis, it is predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 799>: 

35 1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGGGGCGGA TTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

40 1 . . A GAS ANN I SA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 

51 MGGFDCRLFR LET A* 

Further work revealed the complete nucleotide sequence <SEQ ID 80 1>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

45 101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC T CG AT G G AAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

50 351 ■ GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

4 01 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

55 601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG ' TCCTCTCCAC 
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8 01 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

8 51 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 

1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFL DAY SAGA S ANN I SARFA E TPVAVGVTL 

301 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 

351 AGLVLWLAGF ILYRFLL SSG WESS1GLT AP VMSAVAIATV SVRLFF KKTQ 

401 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of N. 

meningitidis: 

10 20 30 

orf 125 . pep AGASANN I S ARFAETPVAVS VTLIGTVLAV 

I i : I I I I I I I : : : I t : I I : I : : : M : I I I 
orf 125a KI LLGAGLGAAGI LAWLSTVTTT FLDAYSAGVS ANN I S AKLSE I PI AVAVAWGTLLAV 

250 260 270 280 290 300 



40 50 60 

orf 12 5. pep MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 

: I I I I I I I I I I I I I ! I I I I I I : 
orf 125a LLPVTE YEN FLLL IGSVFAPMAAVLI ADFFVLKRREE I EG 

310 320 330 340 

The ORF 125a partial nucleotide sequence <SEQ ID 803> is: 



1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

3 51 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

4 01 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 
4 51 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 
501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 
551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 
601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 
651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

7 01 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 
751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

8 01 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 
851 ATATTTCCGC CAAACTTTCG GAAATACCNA TCGCCGTTGC CGTCGCCGTT 
901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 
951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. 

This encodes a protein having the partial amino acid sequence <SEQ ED 804>: 



1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
51 AVGG ALFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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101 VMIYAGATVS SALGKVLWDG 

151 VS MLLMLLAV LWLSAEXF ST 

201 LAADYTRHAR RPFAATLTAT 

2 51 LGAGLGAAGI LAWL STVTT 

301 VGTLLAVLLP VTEYEN FLLL 



ESFVWWALAN GALIVLWLVF 



AGSTAAXVXD 
LAYTLTGCWM 
TFLDAYSAGV 
IGSVFAPMAA 



GMSFGTAVEL 
YALGLAAALF 
SANNISAKLS 
VLIADFFVLK 



ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



GARKTGGLKT 
SAVMPLSWLP 
TGETDVAKIL 
E IPIAVAVAV 
RREEIEG . . 
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45 



10 20 30 40 50 60 

or f 12 5a. pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I M I I I I I I : I I M I M I I I I M I I I I I I I I I I I M M I I I II I M I I II II I M M I 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLIAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125a . pep AY I GALTGXXSME S VRLS FGKRG S VLFS VANMLQLAGWTAVM I Y AG AT VS SALGKVLWDG 
I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I M I I M II I I II II I I I M I I I 
orfl25-l AY IG ALTGRS SMES VRLS FGKRG S VL FSVANMLQLAGWTAVMI YAGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 125a . pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 
I | M I I II II I II II I I I I I II I II I I II I I I I I II I I I I II I I I I I I I I I I I I I I I 
orf 125-1 ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 125a . pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
I I | I 1 I I I I I I I I M M I I I 1 I I I I I I I I I I I I I I I M M I I I I I I I I I II I I I I I I I II 
orfl25-i GMS FGT AVELS AVMPLSWL PLAADYTRHARRPFAATLTAT LAYT LTGCWMYALG LAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12 5a. pep T G E T D V AK I L LG AG LG AAG I LA WL S T VTTT FL D A Y S AG V S AN N I S AK L S E I P I AVA V AV 
M II I I II I I I I M I I I M I I I I I I I I II M I I I II I i I : II II I I I : : : I I : I I : t : : 
orf 125-1 TGET DVAKI LLGAGLGAAG ILAWLSTVTTT FL DAY SAGAS ANN I S ARFAET PVAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: I I : I I I : I I I II I I I I I It I I I I I I I 1 I I I M I M M I I I M I I I I 
orf 12 5-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 

310 320 330 340 350 360 

Homology with a predicted ORF from N. gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 



50 



orf 125 . pep 
orf 125ng 
orf 125. pep 
orfl25ng 



AG AS ANN I S AR FAET P VAV S VT L I GT VLA V 
I I I I I I I I I I II II 1111:1111 I I 1 I I 
KI LLGAGLGITG I LAWLST VTTT FLDT YS AGAS ANNI SARFAE I PVAVGVTL I RTVLAV 

ML P VTE YEN FLLL IGS V FAPM- GG FDCRL FRLETA 64 
I I I I M I : I I I I I I 111:11 I I I I I I II 1:11 
MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 



30 
308 



55 



60 



An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FT GET D VAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIRTVLAVML PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 



BNSDOC1D: <WO 9924578A2_I_> 



WO 99/24578 



PCT/IB98/01665 



-445- 



Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 



10 



15 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



ATGTCGGGCA 
TTGGTTCGGC 
TCGCCCCCTT 
GCCGTCGGCG 
CGGACGCAGC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TCGTGCTGTG 
GTTTCGATGC 
GTTCGCTTCG 
CCTTCGGAAC 
CCGCTGGCCG 
CCTGACGGCA 
TGGGTTTGGC 
CTGTTGGGCG 
CACCGTTACC 
ACAACATTTC 
CTGATCGGCA 
CTTCCTGCTG 
TTGCCGACTT 
TTTGCCGGAC 
GCTCTCGTCC 
CTGCCGTTGC 
CAAT CTTTAC 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
T CG AT GG AAA 
TTCCGTGGCG 
ACGTCGGCGC 
GAATCCTTTG 
GCTGGTTTTC 
TGCTGATGCT 
TCCGGCACAA 
GGCAGTCGAA 
CCGACTACAC 
ACGCTCGCCT 
GGCGGCTCTG 
CGGGCTTGGG 
ACAACGTTTC 
CGCGCGTTTT 
CGGTGCTTGC 
CTTATCGGCT 
TTTCGTCTTA 
TGGTTCTGTG 
GGTTGGGAAA 
CATTGCCACC 
AAAGGAACCC 



TCCTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GTGTGCGCCT 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GCTTGCCGTG 
ACGCCGCGCC 
CTGTCCGCCG 
GCGCCAAGCA 
ATAC.GCTGAC 
TTTACCGGAG 
CATAACGGGC 
TCGATACCTA 
GCGGAAATAC 
CGTCATGCTG 
CGGTATTTGC 
AAACGGCGTG 
GCTGGCAGGC 
GCAGCATCGG 
GTATCGGTAC 
GTCATGA 



TCCGCCGCCA 
AATCAGCACG 
CGGCCCTGCT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
GAACGGGCGG 
TTGTGGTTGA 
CGCCGTTTCA 
TCATGCCGCT 
CGCCGCCCGT 
GGGCTGCTGG 
AAACCGACGT 
ATTCTGGCAG 
TTCCGCCGGC 
CCGTCGCTGT 
CCCGTTACCG 
GCCGATGGCG 
AGG AG AT T G A 
TTCATCCTCT 
TCTGACCGCC 
GCCTTTTCTT 



TCGGGCTGGT 
GGTACGCTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAATGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCACTGA 
GCTGAAAACC 
GCGTCGAAGT 
GACGGCATGA 
TTCCTGGCTG 
TTGCGGCAAC 
ATGTATGCCT 
GGCGAAAATC 
TCGTCCTCTC 
GCGAGTGCGA 
CGGCGTTACC 
AATATAAAAA 
GCGGTTTTGA 
AGGCTTTGAC 
ACCGCTTCCT 
CCCGTAATGT 
TAAAAAAACC 



This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 



30 



35 



i 

51 
101 
151 
201 
251 
301 
351 
401 



MSGNAS SPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



AVGGALFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VS MLLMLLAV LWLSVEVFA S 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIGTVLAVM L PVTEYKN FLL 
F AG L V L W LAG FILYRFLL S5 
QSLQRNPS- 



SMESVRLSFG KCGSVLFSVA 
E S FVWW ALAN GALIVLWLV F 
SGTNAAPAVS DGMT FGTAVE 
TLAYTLTGCW MYALGLAAAL 
TTFLDTYSAG ASANNISARF 
LIGSVFAPMA AVLI ADFFVL 
GWESSIGLTA PVMSAVAIAT 



NMLQLAGWTA 
GARRTGGLKT 
LSAVMPLSWL 
FTGETDVAKI 
AE IPVAVGVT 
KRREEIEGFD 
VSVRLFFKKT 



ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overlap: 



40 



45 



50 



55 



60 



65 



10 20 30 40 50 60 

orf 125-1 .pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I 1 I I I i I I : I I I I : I I I M I I M I I I I I I I I II I I I I i I I I I M II I II I I M M I I 
o r f 1 2 5ng - 1 MSGNAS SPSS SAAIG LVWFGAAVS I AE I STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf 125-1 .pep AY I GALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVS SALGKVLWDG 
I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I II I I 
orfl2 5na-i AY I G ALTGRS SME S VRLS FGKC G S V L FS VANMLQLAGWT AVM I YVGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 179 

orf 125-1 .pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I M I I I I I I I I I I : I I I ::: I :: i I I 1 
orfl25ng-l E S FVWW ALANG AL I VLWLVFGARRTGGLKT VSMLLMLLAV LWLSVEVFAS SGTNAAPAVS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

orf 125-1 . pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
I I I : i I M I I I I I I M 1 I I i I I I I I I I I : I I I I I I M I II I I I I I I I I I I I I I I I I M I I 
orf 125ng-l DGMT FGTAVE LSAVMPLSWL PL AADYTRQARRPFAATLT AT LAYTLTGCWMYALGLAAAL 

190 200 210 220 230 240 

240 250 260 270 280 . 290 299 

orf 125-1 . pep FTGETDVAKI LLGAGLGAAG I LAWLSTVTTT FLDAYS AGASANN I SARFAET PVAVGVT 
I I I I I I I I I I I I I I I I I : I I I I 11 I I M I I I I I I : 1 I i 1 I I I I II I It I I I I I I I I I 1 
orfl25ng-l FTGETDVAKILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVT 

250 260 270 * "280 290 300 



BNSDOCID: <WO 9924578A2_L> 
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300 310 320 330 340 350 359 

orf 125-1 . pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
I I I i I M I I I I I I I I : I t I I I I I I I I I I M I I II ! I ! I II I I I I I I M I I I II I I I I I I I 
5 orf 125ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 



360 370 380 390 400 

orf 125-1 . pep FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 
10 I I I ! I I I I II 1 I I I I I I I I ! I I I I I I I I I i I it I I I I I I I I I M I I I I i 

orfl25ng-l FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
15 N. gonorrhoeae , and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 96 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

20 51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A . ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

25 301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

4 51 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

30 This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 



1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK... 

35 Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGACCCGTA 
GTTGCAGCTT 
GCCGCCGGGG 
CCTGCGGCGG 
GCAGAGCATC 
CGATGATGCA 
CCATTATCCA 
TGACGAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCC 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTCCATCC 
TTCGTCATCG 
CGTGCGTTCA 
CCTTCGGCGA 
CTCAACCACC 
TGAAATCAAC 
CCGCCGCCGC 
CCCGAACGCG 
A 



TCGCCATCCT 
GCAGAACAAG 
CGAACACGCC 
AAGCGGTCGA 
CCGCTTTGGC 
GGAAAACGGC 
GCGAGTTCGT 
GTCCGTTGGC 
ACGTTTTTCA 
GGCAAATATT 
TGCCATTGGG 
CTGGCTGATC 
CCCCCGAGCA 
GTTTACACAC 
GCGTTATCCG 
GCGCGACCCA 
GGGTTGGAAC 
AGCCGACATC 
ACAACCCCGA 
GGCCTTTTCC 
CGCCAGATTG 
ATAAAGAAAG 



CGGCGGCGGC 
GTTATCAGAT 
GCCGCCTATG 
AGCCACGCCC 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGACGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGCG 
CACCAGCACC 
CCGAAATCAC 
CTCTACATCG 
AATCGAAAGC 
TCTTGTCCGC 
CTCGAAATCG 
AATCCGTTAC 
GCCACGGTTT 
GCAGTGGCAC 
CGGTTTGGCG 



CTCTCGGGAA 
TGCACTTTTC 
TTGCCGCCGC 
GAAGTGGTCA 
ATGCCGTCTG 
TGTGGCACGG 
AAACGCGGCG 
CATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
CGTCCCCGAA 
GCTACGGCGC 
CTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
ACTCTATGCC 
CCACCGGCCT 
AACCGCGCCC 
CATGATCTCC 
TGTTTGACGG 
TAT AT C CGAA 



GGCTGACCGC 
GATAAAGGCT 
CATGCTCGCG 
GGCTGGGCAG 
AACACGCACA 
GCAGGACAAG 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GGCCTGCAAG 
AAAAACCGCG 
TACGCGGCGA 
CCCGTGCGTC 
AAACCACGTC 
CCCCCGCCAG 
ATCCACCCCG 
GCGCCCCACG 
GACGCCTGAT 
CCCGCCGTAA 
AAAAGACGCG 
GACAAGATTA 



BNSDOCID: <WO 9924576A2J_> 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-1: 



i 

51 
101 
151 
201 
251 
301 
351 



MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 



PAAE AVE AT P 
PLSSEFVRHL 
LDGRQILSAL 
WNQSPEHTST 
FVIGATQIES 
LNHHNPEIRY 
PERDKESGIiA 



EWRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVAR 
ESQAPASVRS 
NRARRLIEIN 
YIRRQD* 



PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 
VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 
CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 
VYTPEITLNR PVRLLHPRYP LYIAPKENHV 
GLELLSALYA IHPAFGEADI LEIATGLRPT 
GLFRHGFMIS PAVTAAAARL AVAL FD G K DA 



10 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from K meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A ofN. 
meningitidis: 



15 



20 



25 



30 



10 20 30 40 50 60 

orf 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 
I I I I I I I t II I I I I I I I I I I I I I I I I I I M I I : M I I I I I I I I I I M I I I I i I : I I I I I 
orfl2 6a MTRIAILGGG LSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 6. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 
I I ! I i I I I 111111111:1:1 : II I I I I I I I I I I I i I I M : I I I I I I M I! : I I I 
orf 12 6a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6. pep VRWRADDIAEREPQLGGRFXDGI YLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I 1 II I I I I M i I I 1 I 1 I II I I I 1 I I I I I I I I I I : I I I I t I I I I I II I I I I I I M : II 
orf 12 6a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 .140 150 160 170 180 

The complete length ORP126a nucleotide sequence <SEQ ID 81 3> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGACCCGTA 
ACTGCAGCTT 
GCCGCCGGGG 
CCTGCGGCGG 
GC AG AN CATC 
CCATGATGCA 
CCTTTATCCA 
TGACNAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCC 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTACACCC 
TTCGTCATCG 
CGTGCGTTCC 
CCTTCGGCGA 
CTCAATCACC 
TGAAAT CAAC 
CCGCCGCCGC 
CCCGAACGCG 
A 



TCGCCATCCT 
GC AG AACAAG 
CGAACACGCC 
AAGCGGTCGA 
CCGCTTTGGC 
NGAAAACGGC 
ACGAGTTCGT 
GTCCGTTGGC 
ACGTTTTTCA 
GGCAAATATT 
TGCCATTGGG 
CTGGCTGATC 
CCCCCGANNA 
GTTTACACAC 
GCGCTATCCG 
GCGCGACCCA 
GGGCTGGAAC 
AGCCGACATC 
ACAACCCCGA 
GGCCTTTTCC 
CGTCAGATTG 
ATGAAGAAAG 



CGGCGGCGGC 
GTTATCAGAT 
GCCGCCTATG 
AGCCACGCCT 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGACGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGCG 
NACCAGCACC 
CCGAAATCAC 
CTNTACATCG 
AATCGAAAGC 
TCTTATCCGC 
CTCGAAATCG 
AATCCGTTAC 
GCCACGGTTT 
GCAGTGGCAC 
CGGTTTGGCG 



CTCTCNGGAA 
TGCACTTTTC 
TTGCCGCCGC 
GAAGTGGTCA 
ATGCCATCTG 
TGTGGCACGG 
AAACGCGGCG 
CATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
TGCCCCCGAA 
GCTACGGCGC 
CTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
ACTCTATGCC 
CCACCGGCCT 
AACCGCGCCC 
CATGATCTCC 
TGTTTGACGG 
TATATCCGAA 



GGCTGACCGC 
GATAAAGGCT 
CATGCTCGCG 
GGCTGGGCAG 
AAAACGCCTG 
GCAGGACAAA 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GACTTGCAAG 
AAAAACCGCG 
TACGCGGCGA 
CCCGTGCGCC 
AAACCNCGTC 
CACCTGCCAG 
GTCCACCCCG 
GCGCCCCACG 
GACGCCTGAT 
CCCGCCGTAA 
AAAAGANGCG 
GACAAGATTA 



55 This encodes a protein having amino acid sequence <SEQ ED 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAE AVE ATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE - DLQAQYDWLI DCRGYGAKTA 

60 201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 



BNSDOCID: <WO 992457BA2J_> 
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251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVAL F DGKXA 
351 PERDEESGLA YIRRQD* 

ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



10 



15 



20 



25 



30 



35 



40 



45 



orf 126a. pep 
orfl26-l 

orf 126a . pep 
orfl26-l 

orf 126a .pep 
orfl26-l 

orf 126a . pep 
orf 126-1 

orf 126a . pep 
orfl26-l 

orf 126a . pep 
orf 126-1 

orf 126a . pep 
orf 126-1 



10 20 30 40 50 60 

MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I i M I I I I II I I I I I I I M I I I I II I II I I I I I I I I I II I II I M I I I I I 1 I I I I I M I 
MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
i I I I I I II I I 1 I I I I I I : I : I : I I II 1 I I M I I I I I I I I I : II I I I I I I I I I I I I I 
EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 

130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 
I II I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I M I I I I I I II I I I I I I I I M : I I 
VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 

190 200 210 220 230 240 

DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
I I i I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I || I | | I I I I I | | I 
GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

190 200 210 220 230 240 

250 260 270 280 290 300 

LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 
I I I I 1 I I I I I I M 11 I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I | | M I I I II I I I 
LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
I M I I I I I I I I I I I I I I I I I I I M M 1 I I i I I I I I I I : I I II I II I I I 11111:11111 
LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

310 320 330 340 350 360 



YIRRQDX 
I I I I I I I 
YIRRQDX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 



50 



55 



orf 12 6 . pep 
orf 126ng 
orf 126 .pep 
orf 126ng 
orf 126 .pep 
orf 126ng 



MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 
I I I I I : I I I I I I I I t M M I I I I I I M MM: I : I I I I I I I II I I I M M I : I I I I I 

MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 
I I : I I I M I I I II I M I I ! I I I I I 1 1 I I I I I I I I I M I I I I I I I I I II I II I I : I I M 

EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 

VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 
I M I I I : I I I I I M II I II M M M M I I I I I I : I I M I I I I 1 I I M I I I I I II : I : 

VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 



60 An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ED 816>: 



1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 
101 
151 
201 
251 
301 
351 



PAAEAVEATP 
PLSSEFVRHL 
LDGRQILSAL 
WNQSPEHTST 
SSSARPKSKA 
LNHHNPEIRY 
PERDEESGLA 



EVIRLGRQSI 
KRGGVADDEI 
AD AX. DE LN V P 
LRGIRGEVRG 
KAKPPPAYVP 
SRERRLIEIN 
YIGRQD* 



PLWRGIRCRL 
VRWRADE I AE 
CHWEHECAPQ 
FTRPKSRSTA 
GWNSYPRSMP 
GLFRHGFMIS 



NTLTMMQENG 
REPQLGGRFS 
DLQAQYDWVI 
PCACCTRAIR 
STPPSAKPTS 
PAVTAAAVRL 



SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
STSPRKKTTS 
SKWRPGLRPT 
AVALFDGKDA 



Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 



10 



15 



20 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



ATGACCCGTA 
ATTGCAGCTT 
CCCGCCAAGG 
CCTGCGGCGG 
GCAGAGCATT 
CGATGATGCA 
CC ATT AT CCA 
TGACGAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCT 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTGCACCC 
TTCGTCATCG 
CGTACGTTCC 
CCTTCGGCGA 
CTCAACCACC 
CGAAATCAAC 
CCGCCGCCGC 
CCCGAACGTG 
A 



TCGCCGTCCT 
GCAGAACAAG 
CGAACACGCC 
AAGCGGTCGA 
CCGCTTTGGC 
GGAAAACGGC 
GCGAGTTCGT 
GTCCGTTGGC 
ACGTTTTTCA 
GGCAAATATT 
TGCCATTGGG 
CTGGGTAATC 
CCCCCGAGCA 
GTTTACACGC 
GCGCTATCCG 
GCGCGACCCA 
GGGCTGGAAC 
AGCCGACATC 
ACAACCCCGA 
GGCCTTTTCC 
CGTCAGATTG 
ATGAAGAAAG 



CGGAGGCGGC 
GTTATCAGAT 
GCCGCCTATG 
GGCAACGCCC 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGATGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGGG 
CACCAGCACC 
CCGAAATCAC 
CTCTACATCG 
AATCGAAAGC 
TCTTATCCGC 
CTCGAAATCG 
AATCCGCTAC 
GGCACGGCTT 
GCAGTGGCAC 
CGGTTTGGCG 



CTTTCCGGAA 
TGAACTTTTC 
TTGCCGCCGC 
GAAGTCATCA 
ATGCCGTCTG 
TGTGGCACGG 
AAACGCGGCG 
AATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
CGCCCCCCAA 
GCTACGGCGC 
TTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
GCTCTATGCC 
CCGCCGGCCT 
AGCCGCGAAC 
TATGATTT CC 
TGTTTGACGG 
TAT AT C G G AA 



GGCTGACCGC 
GACAAGGGCA 
GATGCTCGCG 
GGCTGGGCAG 
AACACGCTCA 
GCAGGACAAG 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GACCTGCAAG 
GAAAACCGCG 
TACGCGGCGA 
CCCGTGCGCC 
AAACCACGTC 
CCCCCGCCAG 
GTCCACCCCG 
GCGCCCCACG 
GCCGCCTCAT 
CCCGCCGTAA 
AAAAGACGCG 
GACAAGATTA 



1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 



35 



40 



1 

51 
101 
151 
201 
251 
301 
351 



MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 



PAAEAVEATP 
PLSSEFVRHL 
LDGRQILSAL 
WNQSPEHTST 
FVIGATQIES 
LNHHNPEIRY 
PERDEESGLA 



EVIRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVAR 
ESQAPASVRS 
SRERRLIEIN 
YIGRQD* 



PLWRGIRCRL NTLTMMQENG 
VRWRADE I AE REPQLGGRFS 
CHWEHECAPQ DLQAQYDWVI 
VYTPEITLNR PVRLLHPRYP 
GLELLSALYA VHPAFGEADI 
GLFRHGFMIS PAVTAAAVRL 



SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
LYIAPKENHV 
LEIAAGLRPT 
AVALFDGKDA 



ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 



45 



50 



55 



60 



65 



orf 126-1 . pep 
orf 126ng-l 



orf 126-1 .pep 
orf 126ng-l 



orf 126-1 .pep 
orf 126ng-l 



orf 126-1 . pep 
orf 126ng-l 



10 20 30 40 50 60 

MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I | I | | : I I I I I I I M I I 1 II M I I I I I I I 1 1 I I : I I I II I I I I I I I I i I I I I I 1 I I I I 
MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

EWRLGRQS IPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

I I : I I I I i I I I I I I I I II I M I I I I I I I I I I I I I II I I I I I M I I I I I M I I I I I I I I I 
EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
70 80 90 100 110 120 

130 140 150 160 170 180 

VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I I I : M I M II I I I I I 1 I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : 
VRWRADE I AERE PQLGGRFS DG I YLPTEGQLDGRQI LS ALADALDELNVPCHWEHECAPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

I I I I I I I : I I 11 I I I I I I t I I II I I I I I I I I I I I I I I I I I I II I I 11 I I I I I 1 I I M I I 
DLQAQYDWVI DCRGYGAKT AWN QSPEHTSTLRG I RGEVARVYTPE I TLNRPVRLLHPRYP 
190 200 210 220 230 240 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-450- 

250 260 270 280 290 300 

orf 126-1 . pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
i I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I M I I : I I II I M I 1 I I M : M I i I 
orfl2 6ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 
5 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12 6-1 .pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 
I I i I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I : M I I I I I I I I I M I I I: II I I I 
10 orf 12 6ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 

orfl26-l.pep YI RRQDX 
15 I I II I I 

orfl26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF0044O8) putative amino acid oxidase flavoprotein [Rhizobium etli) 
Length = 327 

20 Score = 169 bits (423), Expect = 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 

Query : 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 
RI V G G++G A QL G+++ L ++ G 
25 Sbjct: 2 RILVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASGFAGGMLAPWCERESAEEPV 60 



30 



Query : 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbj ct : 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 113 

Query: 12 3 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 



35 Query: 183 QAQY DWV I DCRGYGAKTAWNQS PEHTSTLRGIRGEVARVYT PE ITLNRPVRLLHPRYPLY 242 

+ D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRVVDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPI Y 218 

Query: 24 3 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 
40 I P^+ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 

Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 27 8 

Query: 303 HHN PEIRYSRERRLIE INGLFRHGFMI S P 331 
+ P R ++E R + +NGL+RHGF+++P 
45 Sbjct: 27 9 DNLP--RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N. meningitidis and N. gonorrhoeae^ and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies.- 



Example 97 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
50 819>: 

1 AT G A CT G AT A ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AG AAAAT G C A 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

55 201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA aTTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

4 01 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

60 4 51 GTAG 



BNSDOCID: <WO 9924578A2_L> 



WO 99/24578 



-451- 



PCT/IB98/01665 



This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 

101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 

5 151 * 

Further work revealed the following DNA sequence <SEQ ID 82 1>: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

10 151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

15 4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

20 Computer analysis of this amino acid sequence gave the following results: 



Homoloev with a predicted ORF from N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF 127a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

25 orf 127 .peo MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

I ! I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I ! II : I II I I I I I I I I I I 1 1 I M I 
orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



30 70 80 90 100 110 120 

orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 
I I I I I I I II I I I I I I it I I I I M I I I I I II I I I I I 1 I I I I I I I M I I I I I I I 11 1 I II 
orf 127a GRFKQTSTKWPSLPIKEAEGFCIRLNG I -ARGALDSKFMLKAVAIDKDKN PFIIKMNENL 

70 80 90 100 110 

35 

130 140 150 

orf 127 . peD VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 
I I I I I I I I I I M I I I I I II II I I I I I I II I I 
orf 127a VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 

40 120 130 140 150 

The complete length ORF 127a nucleotide sequence <SEQ ID 823> is: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

45 151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

50 4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 

101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 



BNSDOC1D: <WO 9924578A2_I_> 



WO 99/24578 



-452- 



PCT/IB98/01665 



ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

10 20 30 40 50 60 

orfl27a pep MTDNRGFTLVELISWLILSVLALIVYPSyRNYVEKAKINTVRAALLENAHFMEKFYLQN 

M I M | I I I I I I I I I I I 1 [ I I I I I t I I I I I I I 1 I I i I I I i : I I I I I I I I i I M I I 

5 orf 127-1 MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 4 0 50 60 

70 80 90 100 110 120 

orf 127a pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
10 ^ | | | I | | | I | I I I I I I II M I I M I I I I I I I I I I 1 I M I M I I I I I M I I I M I I I M N I 

o>-f 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

130 140 150 

15 orf 127a .pep TFI CKKSAS SCSDGLDYFKGNDKDCKLLKX 

II I f I I I I I I I I II I I I II I! I I II II I I I 
orfl27-l TFI CKKSAS SCSDGLDYFKGNDKDCKLLKX 

130 140 150 

20 Homology with a predicted ORF from N. gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
N.gonorrhoeae: 

orf 127 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 
I I I I I I I II I I I I I I I I I I I I I E I I I M I M I I I I I I I 1 M I I I I : I I I I I I I I I I I I i I 
?5 0 ^f 1 27ng MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 



30 



or^l27 . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

I I I I I I I I II I I I I I I I i I Ill I I M I I I I I I I I I I I II I I I I I I M I I I I I 

orf 127ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127 . Dep VTF1CKKSASSCSDGLDYFKGNDKDCKLLK 150 

II I I I I I I I II I I I I I 11 I I I 1 I I I II I I 

orf 127ng VTFI CKKSAS SCSDRLDYFKGNDKDCKLLK 14 9 



The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 

35 1 AT G ACTG AT A ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

40 2 51 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AG AT G AAT G A 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTT AAAGGA . AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 

45 1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

50 orf 127-1. pep MT DNRG FT LVE L I SWL I LS V LAL I V Y P S YRN YVEKAK I N AVRAALLENAH FME K FY LQN 

• | | M I II I I I I I I I I I ! I I II I I I I I I I M I I I I I I I I I I I M I I II I II I I I I I I I I I I 
orfl27ng-l MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

55 70 -80 90 100 110 120 

orf 127-1. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
I 1. 1 M I I I I I f I 1 I I I M I M I I I I I I I II I 1 I I 1 M I II M I I II I I I I I I I I I I I I I I 
orf 127ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 

60 



BNSDOCID: <WO 9924578A2_I_> 
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130 140 150 

orf 127-1 . pep T FI CKKSASSCS DGLDYFKGNDKDCKLLKX 
t I 1 I I I i I 1 I I ! II II I M I I t I I I I i i I I 
orfl27nq-l TFI CKKSASSCS DGLDYFKGNDKDCKLLKX 

5 ~ 130 140 150 

This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

10 Example 98 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 827> 

1 . . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

15 i5i CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

20 4 01 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 

4 51 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

25 651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

7 01 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 . . VSLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

30 101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFI PGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCC 
35 51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

TTCTTTGTCA TCTCAGGATT CCTCATTACC 
ACAGAACGGT TCTTTTTCTT TCCGGGATTT 
GGATTTATCC TGCCTTTATT GCGGCCGTGT 
TCTCAAATCT TCCTTTACGA AGATTTCAAC 
40 301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 
CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 
TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 
501 GCGTAACATC AG CAT CAT CC TGTTTTTGAT TTTGACTGCC 

45 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC 

AAC AG CAAAT GGAAAACGGC 
TGCTTGCCTG CCTGTTCGTG 
ATGACCCTGC TCCTTCCCTG 

50 801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

CCCATCGTAT TTGTCGGCAA AATCTCTTAT 
GATTTTTATT GCTTTCGCCC ATTACATTAC 
TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

AGTTATTATT T GAT T G AAC A GCCGCTTAGA 

55 1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 
AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GG AAAAT CAT 



1 


ATGCAAGCTG 


51 


CGTGCTATCC 


101 


GATTCCTGGG 


151 


GGCATCATTC 


201 


TTATACCCGC 


251 


CGCTGGCTTC 


301 


CAAATGCGGA 


351 


TCTGGGGTTT 


401 


TACTGCATAT 


451 


CCCCTTTTGC 


501 


GCGTAACATC 


551 


TGCCAAGCGG 


601 


CTTTCGACAC 


651 


TTACGGGCAA 


701 


AGTTGCTTTC 


751 


ATTGACAAAC 


801 


CCTGCTGACG 


851 


CCCGCATCCT 


901 


TCCCTATACC 


951 


AGGCGACAAA 


1001 


CGGCCGGATT 


1051 


AAACGGAAGA 


1101 


GTCCCTGATA 


1151 


AGGAACACCT 
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1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTAT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
C AAC AC AT CA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGGCGGCGCh 



CCGTCCTGAC 
GATTATGTCG 
TTCGGAGTGT 
GTCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTCCA 
TTGCAGTAG 



CCTCGGCGAC 
GCAGCCGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



TCGCACGCCG 
AGGGTGGAAA 
TAGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



GACACCTGAG 
GCCAAAATCC 
GCTGGCAGAC 
CCGAAGCCGT 
CCTGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCCCA 



15 This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 



20 



25 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI__ 
GIIL SEIQNG 
QMRKTVELSA 
PLLLIFCCKK 
LSTLRFPELL 
IDKHNPF IPG 
SLYLYHWIFI 
KRKMT FKKAF 
FPETVLTLGD 
NPLCRKYRDE 
ETVKRIAAVK 
KSNQAVFDLI 
YMGRE FHKHE 



DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

AAVSLASV1A SQIFLYEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
TKSLRVLRNI^ 
AGSLLAVYGQ 
MTLLLPCLLT 



RIKRIYPAFI 
QQGYFDLSAD 
SIILFLILTA 



ENPVLHIWSL 
SSFLPSGFYT 



TQNGRRQTAN 
ALLIRSMQYG 



GKRQLLSSLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLACLFV 



AFAHYITGDK 
FCLYLAPSLI 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKSSHGGA 



QLGXPAVSAV 



TLPTRILSAS 
AALTAGFSLL 



LVGYNLYARG 
DYVGSREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 
LQ* 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAANQYL 
EIYGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGAPLAAENH 
LVWVDEKLAD 
LIPGFPARFR 
RPIQAMGDIG 
QDHLTYFGSY 
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60 



Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical integral membrane protein HI0392 of K influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orfl28: 1 VSLASVIASQI FLYEDFNQMRKTVELSAVFLSNI YLGFQQGYFDLSADENPVLHIWSLAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
HI0392: 4 6 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 

Orf 128 : 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILIATSFVSANFYKEVLHQPNIYYLS 165 

Orf 128: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC FGALLACLFV I DKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 . NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORP from N. meningitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A ofN. 
meningitidis: 

10 20 30 

orf 128 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 

I I I t I II I I I I II I t I II I It I I I I I I M I 
orf 128a ILSEIQNGSFS FRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 

60 70 80 90 100 110 

40 50 60 70 80 90 

orf 12 8. pep LSNIYLGFQOGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
I I \ I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I II I I 1 M I I I I I I II I 
orf 128a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 

100 110 120 130 140 150 

orf 12 8. pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
I I I M I I I : I I I I I I I II I I I I I I I II 1 I I I I I I II II I I I It ! M I I I M I I I I I I I I I 
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ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 

160 170 180 190 200 210 

ROLLS SLC FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
I I 1 I M M | I M I I I I i I I I I M f I I I 1 I I I M I I I M I I I I I M I I t II I I I I I I I I I I 
RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 

220 230 240 

VFVGKI S YSLYLYHWI FI AFAPLIRGGKQLGLPA 
I | M II I I I I I I 11 I I 1 I I II I I I I I 1 M I 

VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 

KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

The complete length ORF128a nucleotide sequence <SEQ ID 831 > is: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

2 01. TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

2 51 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

3 01 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 
351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCGG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTTTG CTGCAAAAAA ACAAAATCGC TACGGGTGCT 
501 GCGTAACATC AGCATCATCC TATTTCTGAT TTTGACTGCC ACATCGTTTT 

5 51 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 
601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 
651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 
7 01 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 
7 51 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 
801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 
851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 
901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATT AC ATT AC 
951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GG AAAAT CAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAG AT G AG AA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA • 

14 51 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGAT AT T C CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

17 51 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 

1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 832>: 

1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYPA F1 AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRO LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIF1 AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMT FKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 



orf!28a 

orf 128 .pep 
orfl28a 

orfl28.pep 
orfl28a 

orfl28a 
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501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orf 128a. pep MQAVRYRPEI DGLRAVAVLSVMI FHLNNRWLPGGFLGVDI FFVI SG FLITG 1 1 LSE IQNG 

I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I U I I I 1 M I I M I M I I I 

orf 128-1 MQAVRYRPEI DGLRAVAVLSVMI FHLNNRWLPGGFLGVDI FFVI SGFLI TGI I LSE IQNG 

orf 128a . pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

II M I I I I II I I I I I M II I I I I I I II I I I I I I I I I I I I I M I II I I I I I M I I I II I I i 
orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a . pep QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
I I I I I I M II I I I I II I I I I I I I I M II I II I I II I I I I I I I I M II I I I II I I I It M I 
orf 128-1 QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

orf 128a . pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
: I I I I II I I I I I I M I I I I I II I I II I I I I I I I I M I I I I I I I I I I I M I I M I I I I I I I 
orf 128-1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

orf 128a . pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I I I M I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I 
orf 128-1 FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

orf 128a . pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAG FSLLSYYLIEQPLRKRKMTFKKAF 
I I I I I I I I I I I I M I I I M I I II I I I I I I II I I M I II I I I M I I I I I I I I I I I I I I I II 
orf 128-1 SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALT AG FSLLSYYLIEQPLRKRKMTFKKAF 

orf 128a. pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I M I I I I I I II M I I I II I I I I I I I I I I I I I I I I I I I I II I I I II II II I I I I M I II I 
orf 12 8-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

orf 12 8a. pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I II I I I I I I II I M I II I I II I I I M I I II I I I I I I I I M I I I I I I I I I I I I I I II I I I I 
orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128a. pep PV PRFEAQS FLI PG FPARFRET VKRI AAVKPVYVFANNT S I SRS PLREEKLKRFAANQYL 
I I II t I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I II I I I I I 
orf 128-1 PVPRFEAQS FLI PGFPARFRETVKRIAAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 

orf 128a . pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEI YGRYLYGDQDHLTYFGSY 
I II I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I M I I I I II I I I 
orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEI YGRYLYGDQDHLTYFGSY 

orf 12 8a . pep YMGREFHKHERLLKSSRDGALQX 
I I I I I I 11 I I I I I M I : I I I I I 
orfl28-l YMGRE FHKHERLLKS S HGGALQX 



50 



Homology with a predicted ORF from N.zonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 



55 



60 



65 



orf 128 .pep 
orf 128ng 
orf 128. pep 
orf 128ng 
orf 128 .pep 
orf 128ng 
orf 128 .pep 
orf 128ng 



VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 
I I I I I I I I I I I I I 1 I II I I I I I I : I I I : I I 

ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 

LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

I I I I I I I I : I I M I I I I M I I I I I I I I I I I I I M I M I I I I I I I I I I I 1 I M I I I I I I 
LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 
' I I I I I I I M I II I : M I I I I I I I I.I I M I I I I I I I I I I : II I I I I I I I I I I I I I I I III 

ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 2 32 

RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

II I I I I II I I I I : i I I 1 I I I I : I I I I I : M I I I I I I I II I I I I I II I I I I I II I M I II 
RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 2 92 
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orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 

II I I I I I I I I M I I I I I ! II II I M 

orfl28ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 

5 The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 
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i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
AACATCATTC 
TTATACCCGC 
CCCTGGCTTC 
CAAATGAGGA 
TTTGGGGTTC 
TACT G CAT AT 
CCTCTTTTGC 
GCGTAATATC 
TGCCGGCCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATCGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATG 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTGT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAATACATCA 
TTGCTATAAA 
AAGAGCAATC 
TTGGGTGGAC 
GACGCTATCT 
TATATGGGGC 
AGGCGGCGCA 



TCCGATACAG 
GTCATTATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCT 
AAACCATAGA 
CGATTGGGGT 
CTGGTCTTTG 
TGATATTCTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATTACTCTGT 
ACGATCCGTT 
GCGCTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGCTG 
CCGTCTTGAC 
GATTATGTCG 
TTCGGAGTGT 
GCCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AGAGGATAGC 
ATCAGCCGTT 
CCAATACCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



GCCTGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GG AT TTATCC 
TCTCAAATCT 
GCTTTCTACG 
ATTTCGATTT 
GCGGTAGAGG 
TTACAAAAAA 
TGTTTCTGAT 
GACATCCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCat 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AG C TAT T ATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCACGC 
CCTCGGCGAC 
GCGGCAGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTGATACCCG 
CGCCGTCAAA 
CTCCCTTGAG 
CGGCCTATTC 
TGATTTGGTT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCGGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTTTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACCAAATCAC 
TTTGACCGCA 
ACCAACCcaa 
GTGGGTTCGC 
AACAGAAAAT 
tgCTTGTCTG 
ATAACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCCTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTTT 
TTCAAGAGGG 
CCGTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TGGATGAGAA 
GTTGAAAAAG 
GGGCGGCCAG 
GGTTCAAAGC 
CCTGTATATG 
GGAGGAAAAA 
GGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTCA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
TCATCGTTTT 
TACTTATTAC 
TGTTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
ATTACATTAC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATAAT 
GACACCTGCG 
GCTAAAAT CC 
GCTGGCAGAC 
CCGAAGCTGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
T T G AAAAG AT 
CGACATCGGC 
CCAATGTGCA 
GAAATACACG 
CGGTTCTTAT 
AGCATTCCCG 



This encodes a protein having amino acid sequence <SEQ ID 834>: 
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50 



55 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI_ 
NI IL SEIQNG 
QMRKTIELST 
PLLLI FCYKK 
LSTLRFPELL 
IDKHDPF IPG 
SLYLYHWIFI 
KRKMTFKKAF 
FPETVLTLGD 
NPLCRKYRDE 
ETVKRIAAVK 
KSNQAVFDLV 
YMGREFHKHE 



D GLRAVAVLS VIIFHL NNRW . LPGGFLG VDI FFVISGFLIT 

AAVSLASVIA SQIFLYEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
T K S LRVLRN I_ 
VGS LLAVYGQ 
ITLLLPCLLT 



RIKRIYPAFI 
RLGYFDLSAD 
SI ILFLILTA 



ENPVLHIWSL 
SSFLPAGFYT 



TQNGRRQTEN 
ALLIRSMQYG 



GKRQLLSLLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLVCLFV 



AFAHYITGDK 
FCLYLAPSLM 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKHSRGGA 



QLGLPAVSAV 



TLPTRILSAS 
AALTAG FSLL 



LVGYNLYSRG 
DYVGGREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAINQYL 
EIHGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGT PVAAENN 
LVWVDEKLAD 
LIPGFKARFR 
RPIRAMGDIG 
QDHLTYFGSY 



ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 



60 



65 



orf 128-1 .pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I 1 I I I M t I I I I I I I I I I II I : I I I I I I I I I I I I I I I I M I I I I M M I I : I I I I II I II 
orfl2 8ng MQAVRYRPEIDGLRAVAVLSVI I FHLNNRWLPGGFLGVDIFFVISGFLITOI ILSEIQNG 

orf 128-1 .pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
I I I M I I I I I M II I I I I I I M I I I I I I II I I I M I I I I I M I I I : II I : I I I ! I I I II I 
orfl28ng SFS FRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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orf 128ng 
orf 128-1 .pep 
orf 128ng 



QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

: I I I I It I I I I I I I I I ! I I I I I I M I I I M 1 I t I I I I I 1 I I I I I I I II I I I I I M I I I 
RLGYFDLSADENPVLHIWSIAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

I I I II : I I I t I I 1 I I 11 i I j i I I I I I t I I I : I I I I I I I I I I I I I I I I I I I I I I I I I M 
S S FLPAG FYT D I LNQPNT Y YLST LRFPEL L VG S LLAVYGQTQNGRRQTENGKRQLLS LLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
M I I I : II I I I I! I : 1 M i I : I II I I I i I I I i 1 I I I I I f I M I I M I M I I I I I I I M I I 
FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

I || I I I I I I I I I I I I I I I I I I I I I I I I I I M i I I I I I I ! I I II I I M I I I I I I I I I I I I I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGF5LLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I I I I I M i : I I I I I I I : I I II M I I I I I M I : I : I I I I : I I I I I I I I I I I I t I i I I M I 
FCL YLAP S LMLVG YNL YS RG I LKQEHLRPLPGT PVAAENN FPET VLT LG DS HAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADWPLCRKYRDEVEBCAEAVFIAQFYDLRMGGQ 

I I I I : II I I I 1 II I I M I I I I M I I I I I M I I I I I I I I I I I I I I I I I M I II M i I I II I 
DY VGGREGWKAKI LSLDSECLVWVDEKLADN PLCRKYRDE VEKAEAVFI AQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

I I I M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I II I MM 
PVPRFEAQS FLI PGFKARFRETVKRI AAVKPVYVFANNTS I SRS PLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I M : M I M I M M I M I I : I II II II II II II M I M M I I M M I II I I I 11 I II I II 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGRE FHKHERLLKS S HGGALQX 
I M I I II I M I I M I : M 1 I II 
YMGRE FHKHERLLKHSRGGALQX 
610 620 



40 



45 



50 



55 



In addition, ORF218ng shows homology to a hypothetical H.influenzae protein: 

spl P43993 I Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 I pir I IB64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%), Positives - 152/225 (67%), Gaps = 1/225 (0%) 

VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 97 
+ DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 
MDIFFVISGFLITGIIITEIQQNSFS LKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 60 

DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADEN PVLHIWSLAVEEQXXXXXXXXXIFC 157 
DFN++RKTIEL+ FLSN YLG GYFDLSA+EN PVLHIWSLAVE Q I 



YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 



Query: 


38 


Sbjct: 


1 


Query: 


98 


Sbjct: 


61 


Query: 


158 


Sbjct: 


121 



Query: 218 YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVIDKHDPFIPGIT 262 
Y N + Q + L++L L CLF+++ + FIPGIT 

60 Sbjct: 181 YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



BNSDOCID: <WO 9924578A2_I_> 



WO 99/24578 PCT/IB98/01665 

-459- 



Example 99 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 835>: 

1 ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 * GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

<5 101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

10 i . IJYfTYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 

51 * VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV.. 

Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

15 10 l CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATG CGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

2 51 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

20 351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

25 601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFDIJYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

30 51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA Y IC E I FRAG 1 QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWI F LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
35 Homology with a predicted ORF from N.menin2itidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF 129a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

or f 12 9 pep IIYEYRWMFLYGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

40 | | | M | I i i I I I I ] I I I i I I I II : I i 11 I I I I I I M I I II M M I I I I I 

orfl2 9a MnFRFDIIYEYRWMFLYGALTTLGLT WATAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

10 20 30 40 50 60 

60 70 80 

45 orfl29.peD ALRKVS LLYVTLFRGT P L FVQI VI WAYVWFPFFV 

I I I M I 1 M I I M 11 I I I I M M I I II I I I I M I „™ TT ^ 
orfl2 9a ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV HPSDGILVSGEAAIALRRGYGPLIAG 

70 To 90 100 110 120 

50 orfl2 9a SLALIANSGAYICEI FRAGIQS I DKGQMEAARSLGLTYPQAMRYVI LPQALRRMLPPLAS 

130 140 150 160 170 180 

The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 
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101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

2 01 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

5 301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

10 551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

15 l MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVS LLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

20 ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

orf 12 9a . pep MDFRFD 1 1 YEYRWMFLYGALTTLGLTWATAGGSVLGLLIALARL I HLEKAGAPMRVLAW 
I | | | ! | I I I I I I I I I I I M I I I I 1 I I I I I I M M I I I I I I I M M I I I I I I I I I II I II I 
orfl29-l . M D FRFD 1 1 YE YRWMFLYGALTT LGLT WATAGGS VLGLLLALARLI HLEKAGAPMRVLAW 

25 orf 12 9a . Deo ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

M I I M I I I I! I I II I I II I I I I M I I I M I M I I I I I I I I I I M I I I I I I I I M II I I I 
orf 12 9-1 ALRKVS LL YVTLFRGTPLFVQI VI WAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLI AG 

orf 12 9a . pep S LAL I AN S GAY I C E I FRAG I Q S I DKGQME AAR S LG LT Y PQAMR Y V I L PQ ALRRMLP P LAS 
30 1 i I i I I M I I I I I I I M I M II I I I II I II I I I I I I I I I I M I I I I I I I I I II I I II I I I 

orf 12 9-1 S LAL I AN S GAY I CE I FRAG I QS I DKGQME AAR S LG LT Y PQAMR YV I L PQ ALRRML P P LAS 

orf 12 9a .pep Er I7LLKDSSLLS VIAVAE LA YVQNT I TGRYSVYEEPLYT VAL I YLU4TT FLGWIFLRLE 
! I I I i I ! I 11 I I I I I I I I I I I I I I M I M II I I II I I I I II I I I I I I I M 1 I I I I I I I I I 
35 orf 12 9-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALI YLLMTTFLGWIFLRLE 

orf 129a. pep KRYNPQKRX 
M I II I I 11 
orf 12 9-1 KRYNPQHRX 

40 

Homology with a predicted ORF from N. gonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N. gonorrhoeae: 

orf 12 9 . pep II YE YRWMFLYGALTT LGLT WAXAGGS VLGLLLALARLI HLEKAGAPMRVLAW 54 

45 " * I II I I! I I I I II II It I I M II I : II I I I II I I I M I I I II I I I I 1 I I I I I I I I 

orf 12 9ng MDFRFDI I YE YRWMFLYGALTTLGLTWATAGGSVLGLLLALARLI HLEKAGAPMRVLAW 60 

orf 129. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 
I I I I I I f I I I I I I I I 1 I I I I I I I 1 I I I I 11 I M I 
50 orfl2 9ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 84 1> was predicted to encode a protein having amino 
acid sequence <SEQ ED 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVS LLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

55 101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ED 843>: 

1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ED 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QS I DKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

or f 12 9-1. pep MDFRFDII YEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I | | M | I I M I I I I II I I II I I I I I I f I I I M M I I I! I M M I I 1 I M I I I I I I I t I I I 
orfl2 9ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

o>-fl2 9-l - pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

I | | i i i i M i I I I I I I I I I I I 1 t 1 I I I 1 I i I I I I I I II I II I I I 1 II I I I I I II II II II 

orf 129ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

or^l2 9-l .pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
I | | | M I I I I I I II I I II II I 11 I I I I I I I I I I! I M I I I II II I I I I I I I M I I I I I I 
orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9-1 . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTT FLGWIFLRLE 
I I I I I I I I M I I I I I I I I I II M I II I M M i I I I I I I i I : I I I M I M I I I I I I I I I I I 
orf!2 9ng-l E FI TLLKDSSLLS VI AVAE LA YVQNTITGRYSVYEEPLYTAAL I YLLMTT FLGWIFLRLE 



orf 129-1. pep KRYNPQHRX 
I t 1 II M II 
orf!2 9ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus ] Length - 224 
Score = 132 bits (329), Expect = 2e-30 

Identities - 86/178 (48%), Positives - 103/17.8 (57%), Gaps = 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+ S YV + RGTPL VQI + I +F P+ GI + E' A G +AL 

Sbjct: 58 I STAYVEVI RGT PLLVQI L I VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI + SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLS VI AVAE LA YVQNTITGRYSVYEEPLYTAAL I YLLMTT FLGW I FLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLS VIS I VELTRVGRQIVNTTFNAWTPFLGVALFYLMMT I PL SRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 

1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

4 51 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG C G T T T AC AG A CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . LKECKhKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 

51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 

101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 

151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

2 01 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC. TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

€51 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F S PQT AS FFVA AYWLVLLLFC 

101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ T AG FT ALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAFLM NVN P IFFITVPAI LTAAVFVL YL FT FI PI FRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A ofN. 
meningitidis: 
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10 20 30 

m ™>n LKECRLKDPVFI PNIVYKNIAITFLLLHAA 

orfl30.pep | [| I I I I i M I I ! ! : I I I I M M ! II ! I I I 

nrf!30a LNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNWYKNIAITFLLLHAA 
140 150 160 170 180 190 

40 50 60 70 80 90 

orf 130 oeo AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

' P P | | | j j | | 1 I I I I I : I I I I ! I I I M I I I 1 I I I M I I M I I I I M I I M I I I I I MINI 
orfl30a AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 
200 210 220 230 240 250 

100 110 120 130 140 150 

orfl30 pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
| 1 1 | | I 1 1 ! I I I I I 1 1 1 1 I I = I I t I I I L I I 1 I I 1 I I I I t I t t I 1 I I I 1 I I 1 I i 1 I I t I I t 
orfl30a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 

160 170 180 190 

or f 130. pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
| | I I I I I I I I I I I I I I I t ! I I I 1 :: I : I I I I I M I I I I I I I 
orfl30a VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 
320 330 340 350 

The complete length ORF130a nucleotide sequence <SEQ ID 849> is: 



1 


ATGCGGCCGT 


51 


GGTGTTTTTC 


101 


TGGAACTTAT 


151 


TTGGACTGGA 


201 


GGCGGCATTA 


251 


CTGCCTCGTT 


301 


GCCCGGCTGA 


351 


GTTACTTGCC 


401 


ATTTGAACCT 


451 


TTCGTATCCG 


501 


ATGCCGTCTG 


551 


TCGCCATTAC 


601 


GCGCAAACCG 


651 


CAAGCTGCGT 


701 


GCACTTATTA 


751 


GGCGCGGCGA 


801 


TACCCTCGGT 


851 


GACTGTGGCA 


901 


ATCGCCGTCC 


951 


GAACGTAAAC 


1001 


CCGTGTTCGT 


1051 


GCGTTTACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCG 
CGGGTTTTTC 
TTGCTCGCCG 
TTTCGTCGCC 
TTTGGCTAGA 
GCGTTCACTG 
GTTGCGCGCG 
TGCGCGTCAG 
AAAGACCCAG 
CTTCCTGCTC 
CCGGTTTTAC 
GAGCTTCACC 
CCTGCTCCAA 
AATTACAAAA 
GGC AT GATGG 
CAGCGGCTTT 
CCATCCTNTT 
CCGATATTCT 
GCTTTACCTG 
ACGATCCGGA 



CGCGGCGGTG 
GTGCCATCGT 
GCATACGGCG 
GGGTAACCTG 
CATCCGCTAT 
GCCTATTGGC 
CCGAAACACC 
TTTTTCAGAC 
CAAGTGCATC 
TATTCTTTTG 
TATTCATCCC 
CTGCACGCCG 
CTCGCTCGCC 
ATCACGAACT 
CTCTTTGCCG 
CCTGCCCGCC 
GCAGCGTGAT 
ACCAAGCTCG 
CGCCGCCGCC 
TCATCACCGT 
CTGACATTCG 
ATAA 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGTCG 
ACTGCCCTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCATATGCC 
TAAATATGGC 
GGCGCGGAAG 
CAATGTCGTC 
CCGCCGAACT 
GTCGGCTTTA 
CCTGCGCAAA 
CCGCAGGCTA 
TCCGCGCCCC 
GATGGTGTGG 
ACTACCCGAA 
GTTTCGCGCG 
CCCCGCAATT 
TACCGATCTT 



TCGGTGCGCT 
CAAATTTTCT 
TGCGGCTTTG 
CGACTTTGAT 
TCGCCGCAAA 
GCTGTTCTGC 
CCCTGCTAAT 
GTCAGCGGCG 
GGCGGTGATG 
CCCTGAAAGA 
TATAAAAACA 
TTGGCTGCCT 
TCCTGCTTGC 
CACTACGTCC 
TTTGTGGACA 
TGCACCTGAT 
CTGACTGCCG 
ACTCTGCCGC 
CTGTTTTAAT 
CTGACCGCCG 
TCGGGCGAAC 



50 



55 



60 



65 



This encodes a protein having amino acid sequence <SEQ ID 850>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPOT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

•151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLMN VN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 

orf 130a. pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I i I I [ I I I M M I M I 1 I II I I I 11 I I I I I I I I I I I M M M I I I I I I I 1 1 I I I I I I I M 
orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
| | | M I I I I II I I I I I I I I I I I II I I I I I I M I I I I I M I I I I II M I M I M I I I I M I 
orf 130-1 KPVATLMAALLLAASAILPFS PQTASFFVAAYWLVLLLFCARLIWLDRNT DNFALLMLLA 

orf 130a . pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
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15 



I I M I I II I 1 II M I I i I M I I I I I I I I II I I ! I I I I I 1 1 I I I I II I M i I I I I I I i I : I 
orf 130-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

orf 1 30a . pep YKNIAITFLLLHAAAELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
I I II I I I I I I I I I I I I II I I I I I I I I I : I M II I I I I M I I I I ! I I I I I I I I I I II I I I I 
orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130a. pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
I | I | I I I I I I I | | 1 I I I I I I I I I I I I I I I I II I I : I I I I I II I II II I I I II I ) I I I I I I 
orf 130-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130a . pep IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 
I I I I I I II I I I I II I I I I II IN I I I M I I I I I I I I I II : I I : I I I M I I I I I I I I 
orf 130*1 I AVP ILFAAAVSRAFLMNVNPI FFITVPAI LTAAVFVXYLFT FI PI FRANAFTDDPE 

Homology with a predicted ORF from N. gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 



20 



25 



30 



orf 130. pep 
orf 130ng 
orf 130 .pep 
orf 130ng 
orf 130 .pep 
orf 130ng 
orf 130. pep 
orf 130ng 



LKECRLKDPVFIPNIVYKNIAITFLLLHAA 
I 1 I II I I II 1 I I I I : : I II I I I I I i I I I I 
LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 



30 



201 



90 



AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 
II II I I I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I M I I II M I I I I I I I I 1 II 
AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 261 

LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 
I I II II I I I I I I I I 1 I I I I I I I I I I I II I I I I I I I I I I I I M I I I M I 1111:11111 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

FLXNVNPXFFITVPAILTAAVFVLYLFXFI PI FRANAFTDDPE 193 

I Mil (Mill I I I I I M : I I I : : I : I I I I I I II II I I I 
V LMN VNP I FFI T VPE I LTAAVFMLYLLTFV PI FRANAFTDDPE 3 64 



An ORF130ng nucleotide sequence <SEQ ID 85 1> was predicted to encode a protein having amino 
35 acid sequence <SEQ ID 852>: 



40 



1 MNKFFTHPMi? PFFVGA AVLA 

51 RRFFDYRFVG PDGFFRQPET 

101 LAGVAAVLRL ADLARRQHRT 

151 H LNMAAVM FV SVRVSVLL GT 

201 AAELWLPAQ T AGFTALAVGF 

251 AAGYLWTGAA KLQNLPASAP 

301 DYPKLCR IAV 5ILFASAVSR 

351 VPIFRANAFT DDPE* 



ILGALVFFHQ 
CRYFDGGWA 



PRRYHPAPPN FLGTYAAGCI 
CCGCFIAVFT ATCRIFRRRL 



LRSVDVTAAF 
ETLKECRLKD 
ILLAKLRELH 



LHLITLGGMT 
AVLMNVNPIF 



TVFQTAYAVS GDLNLLRAQV 
P VFIPNVIYK NIAITLLL HA 
HHELLRKHYV RTYYLLQLFA 
GGVMMVWLTA GLWHSGFTKL 
FITVPEILTA AVFMLYLLTF 



Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 



45 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
* 851 



ATGCGCCCGT 
GGTGTTTTTT 
TGGAACTTAT 
TTGGACCGGA 
GGCGGTGTTG 
TTGCCGCATT 
GCCTGGCTGA 
GTTACTTGCC 
ATTTGAACTT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATCAC 
CAAACCGCCG 
GCTGCGCGAA 
CTTATTACCT 
GCGGCGAAAC 
CCTCGGCGGC 
TGTGGCACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCT 
CGGGTTTTTC 
TTGCTTGTTG 
TTTCGTCGCC 
TTTGGCTCGA 
GCATTTACCG 
ACTGCGCGCG 
TCCGCGTCAG 
AAAGACCCCG 
CCTGCTGCTG 
GTTTTACTGC 
CTGCACCATC 
GCTCCAGCTC 
TGCAAAACCT 
ATGACGGGTG 
CGGCTTTACC 



TGCGGCAGTA 
GCGCTATCAT 
GCATACGGCG 
AGGCAACCTG 
CGGCTGTTTT 
GCCTATTGGC 
CCGCAACACC 
TTTTTCAGAC 
CAAGTGCATT 
CGTCCTTTTG 
TATTCATCCC 
CACGCCGCCG 
GCTTGCCGTC 
ACGAACTCTT 
TTTGCCGCCG 
GCCCGCCTCC 
GCGTGATGAT 
AAACTCGACT 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGCCG 
ATTGCCGTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCCTATGCC 
TGAATATGGC 
GGCACGGAAA 
CAACGTTATC 
CCGAACTTTG 
GGCTTCATCC 
ACGCAAACAC 
CAGGTTATCT 
GCGCCCCTGC 
GGTGTGGCTG 
ACCCGAAACT 



TCGGTGCGTT 
CAAATTTTCT 
TACCGCTTTG 
CTACTTTGAT 
TTACCGCAAC 
GCTGTTCTGC 
CTCTGTTGAT 
GTCAGCGGCG 
GGCGGTCATG 
CCCTGAAAGA 
TATAAAAACA 
GCTGCCCGCG 
TGCTCGCCAA 
TACGTCCGCA 
GTGGACAGGC 
ACCTGATTAC 
ACTGCCGGAC 
CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

5 This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 L.DRTGFSGNL KPA ATLMAVL LLVAAVLLPF L P QLAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVH LNMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

10 201 Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

2 51 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL TFVPI FRANA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

1 5 orf 130-1 oep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

I! || MMM! I M I MIMMI! I): I II IMMMIMM! IIIKMM IIIIM1 
orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1 pep K P VAT LMAALLL AAS AI LPFS PQT AS FFVAAYWLVLLL FCARLI WLDRNT DN FALLMLLA 
20 " II : I M I I : I I I : I : : : I 1 I I 1 I : I I I I I I II M II I M I I II I I I I I I I I I M I II 

O rfl30ng-i KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

o-^ 30-1 oeD AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 
I | I | I I I I I I I I I II I I I I I I I I I I I I I I I M I U I I : ! I I : I : II I II I I I II II II : : 
25 orfl30ng-l AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

or f 130-1 pep YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

I I I I I I I II I I I I I I I I I I M M I I I I II I I I I I I I I I I I I M I I t M I I I M I 

orfl30ng-i YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYY-LLQ 



30 



o -f 130-1 Dep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
t II I M I II I II I I I ! I II I I I I M I I I I I I I II I M I I I II I I I I I I M I I I M I I I I 
or fi 30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 



35 orn30-l p<=>p I AVP I LFAAAVSRAFLMNVN PI FFITVPAILTAAVFVLYLFTFI PI FRANAFTDDPEX 

! I I 1111:11111 II I I I I I I I I I I I I I I I M I : I I I : M : II I I I I I I I I I M I 
orfl30ng-l IAVS I LFASAVSRAVLMNVN PI FFITVPE I LTAAVFMLYLLTFVPI FRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

40 Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

45 151 GGCGG CGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

2 01 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 
251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 
301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

3 51 CTGCTTGGAA AAG . . 

50 This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 

55 i ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA . CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

5 301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
10 51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of N. 
15 meningitidis: 

10 20 30 40 50 60 

orf!3i .pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
I I I I I I I I M ! M I 1 1 I I I I I I M I I I I I I I i I : I I I I I I I I I I I I I I II I I II I i I I I 
or f 131a - MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
20 10 20 30 40 50 60 

70 80 90 100 110 120 

orf!31.pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
MINIMI I I t I I I I I I I I i I I I M 1 I I I I I I I I I I II M I I I M I I I I I M I M : 
25 orfl31a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 

orfl31.pep K 

30 i 

orfl31a KQGLRRNGLSERVRWX 
130 

The complete length ORF131a nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

35 51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

2 51 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

40 .301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351' TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
45 51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
I i M I M I I I i I II I I I M M I I I I I I I I I I M : M I M M I I II II I I j I 1 I I I I I I I 
50 orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131a. pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I I I I I I I I I I I I I M I I I M I I I I I I ! I II I I I I I I I I I I I I I I II I I I I I I I Mill: 
^ orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orf 131a . pep KQGLRRNGLSERVRWX 

I i it I 1 1 1 1 1 1 1 1 i 1 1 
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or f 13 1- 1 KQGLRRNGLSERVRWX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
5 N. gonorrhoeae: 

orf 131 .pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

| | | | : | | | I I I I I : I I 11 ] I i I It I I I I I I I : I I I M I I I I I H I I I M I I II I I I 
orf 131ng ME I RVIKYT AT AALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWD I GGESPLSLED 60 

10 orfl31 pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

| | | | | I I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I M 1 I I I M : I 1 I 1 MUM 
orf 131ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 

orfl31.pep K 121 
I 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
20 51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 

101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW * 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

25 101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

30 351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 

1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
35 101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 



40 



orf 13 lng- 1 .pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
II I I : II II I M I : I I II I II I I I I M I II M : I I M M I I I I I I II I I II II I II I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 13 lng- 1 .pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
I || I I | II I I II I I I ! I I I I I : 1 M I M I I I M I II M I I I I I I II I : I III I I II I I 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



45 orf 131ng-l .pep KQGLRRNGLSERVRWX 

M I i I 1 I I I I I I II I 1 
orfl31-l KQGLRRNGLSERVRWX 



Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
50 useful antigens for vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in N. meningitidis <SEQ ID 865> 



10 



15 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
TATGAAGGCT 
CGTTATCGGC 
TGAACCTCGG 
GTGCTGCACC 
GACCACCGCC 
CGGGCTTCCT 
CCTGCCGCAA 
TCATCGAAGC 
TtCGTGCATT 
CCACGCCGAC 
ACCTCGTGCG 
CAGCAAAGCC 
AAAATTCGGC 



TCCATATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
CCTGCCtTAT 
ATCATTGGGT 
TCCATGCTCG 
TATtGGCGGC 
ACGCCGCGCC 
CGACGAATAC 
ACCGTCCGCG 
ATCTTTGCCG 
TACCGTGCCG 
TGCAAGATAC 
ACGGAACACG 



CGGTATCGGC 
CGGGGTTTGA 
ACCCAGCTCG 
TCAGTTGGAC 
AGCGCGGGAT 
ATtTcCGGCC 
ACTCGGTGTG 
CATGGGTCTT 
GTACC . GGAA 
AAGACCCGAA 
GACACCGCCT 
TACCGCCGTG 
ACTTGGGCGC 
TCTGAAGGCT 
TTTGGACAAA 
GCTGGCA. . 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTAAAG 
GGATGTGGTT 
CGCAATGGCT 
GCGGGGACgC 
GGAATATgCC 
AATttCGGCG 
CAGCCAATCG 
TTtTCGACAA 
TTGAACAATC 
GATACAGACc 
TAATCGTCTG 
GGCTGCTGGA 



TGGGCGGGCT 
TGCGACGCGA 
TATAGACGTG 
CCGACGTTTA 
GAAGCGATTT 
GTCGGAAAAC 
ACGGCAAAAC 
GGCCTCGCGC 
TTTCCGCCCG 
CCGTTTTTcG 
ACGTTCTAAA 
TGGAATTCGA 
CAGTTCCACT 
CAACGGACGG 
CGCCGGTGGA 



This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 



20 



25 



i 

51 
101 
151 
201 
251 



MKHIHIIGIG 
YEGFDAAQLD 
VLHHHWVLGV 
PAANAAPRPE 
PRRHLCRLGR 
KIRHGTRLA. , 



GTFMGGLAAI 
EFKADVYVIG 
AGTHGKTTTA 
QPIAVFRHRS 
DTDPVPLPRA 



AKEAGFEVSG 
NVAKRGMDW 
SMLAWVLEYA 
RRIRHRLFRQ 
YRAVXRLNRL 



CDAKMYPPMS 
EAILNLGLPY 
GLAPGFLIGG 
TFXIRALPSA 
QRTAAKPARY 



TQLEALGIDV 
ISGPQWLSEN 
VXGKFRRFRP 
YRRVEQSGIR 
FGQRLLDAGG 



Further work revealed the complete nucleotide sequence <SEQ ID 867>: 



30 



35 



40 



45 



50 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
TATGAAGGCT 
CGTTATCGGC 
TGAACCTCGG 
GTGCTGCACC 
GACCACCGCC 
CGGGCTTCCT 
CTGCCGCAAA 
CATCGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGT 
AGCAAAGCCT 
AAATTCGGCA 
CTCGTTCGAC 
ATTTGATGGG 
GCGCGTCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAACCGCGT 
CTGTAAGCCT 
GACTGGGACG 
CGGCAAAGAC 
TAGGCGACCA 
GGAAAGCTGC 



TCCATATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
CCTGCCTTAT 
ATCATTGGGT 
TCCATGCTCG 
TATTGGCGGC 
CGCCGCGCCA 
GACGAATACG 
CCGTCCGCGT 
TCTTTGCCGA 
ACCGTGCCGT 
G C AAG AT ACT 
CGGAACACGG 
GTGTTGCTCG 
CAGGCACAAC 
TCGGTGTCGA 
GTCAAACGCC 
CGACGACTTC 
TGCGCCAACG 
TCCAACACGA 
CAAAGAAGCC 
TCGCCGAAGC 
TTCGATGCCT 
TATTTTGGTG 
TGGAAGCTTT 



CGGTATCGGC 
CGGGGTTTGA 
ACCCAGCTCG 
TCAGTTGGAC 
AGCGCGGGAT 
ATTTCCGGCC 
ACTCGGTGTG 
CATGGGTCTT 
GTACCGGAAA 
AGACCCGAAC 
ACACCGCCTT 
ACCGCCGTGT 
CTTGGGCGCG 
CTGAAGGCTT 
TTGGACAAAG 
CTGGCAGGCC 
ACGGCAAAAC 
CGCATGAACG 
TATTCAGACC 
GGATGGAAAT 
GCCCACCACC 
CGTCGGCGGC 
TGAAGCTGGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTAAAG 
GGATGTGGTT 
CGCAATGGCT 
GCGGGGACGC 
GGAATATGCC 
ATTTCGGCGT 
AGCCAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
AATCGTCTGC 
GCTGCTGGAC 
GGCGAAGCCA 
CGCCGGACGC 
CGCTCGCCGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCCAT 
GCGCGCATCC 
CACGATGAAG 
TCTGCTACGC 
TTGGGCGGCA 
AATCGTGAAA 
GCGGTTTCGG 



TGGGCGGGCT 
TGCGACGCGA 
TATAGACGTG 
CCGACGTTTA 
GAAGCGATTT 
GTCGGAAAAC 
ACGGCAAAAC 
GGCCTCGCGC 
TTCCGCCCGC 
CGTTTTTCGT 
CGTTCTAAAT 
GGAATTCGAC 
AGTTCCACTA 
AACGGACGGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCAAATGGG 
CATTGCCGCC 
CCTTGGGCGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGTG 
GGCTGAACGT 
AACGCCGAAG 
CGGAATACAC 



55 This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 



60 



1 MKHIHIIGIG G TFMGGLAAI 

51 YEGFDAAQLD 

101 VLHHHWVLGV 

151 LPQTPRQDPN 

201 HADIFADLGA 

251 KFGTEHGWQA 

301 ARHVGVDIQT 



EFKADVYVIG 
AGTHGKTTTA 
SQSPFFVIEA 
IQTQFHYLVR 
GEANADGSFD 
ACEALGAFKN 



_AKEAG FE VSG 
NVAKRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKTAGR 
VKRRMEIKGT 



CDAKMYPPMS 
EAILNLGLPY 
GLAPGFLIGG 
RSKFVHYRPR 
NGRQQSLQDT 
VKWDLMGRHN 
ANGITVYDDF 



TQLEALGIDV 
ISGPQWLSEN 
VPENFGVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
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20 



351 I QGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
4 01 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
4 51 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 
Homoloev with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

0-fl32: 4 IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 

IHI+GI - GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o4 57: 3 IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orf 132 : 64 ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
o4 57 : 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orf 132: 124 AWVLEYAGLAPGFLIGGVXG 143 

W-rLE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of TV. 



25 



30 



35 



40 



meningitidis: 



orf 132 . pep 
orfl32a 



orf 132. pep 



orf!32a 



orf 132. pep 
orf 132a 



orf 132 .pep 
orfl32a 



10 20 30 40 50 60 

MKHIKI IGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

I I ! M I I I 11 I II M I : I I M I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : It I I 
MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 
10 20 30 40 50 60 

70 80 90 100 • 110 120 

EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

I I I I I I I I M II 1 I M I I I I I I I I I I I I I II I I I I I : I I I I I I I MM I M I II M 
EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
70 80 90 100 110 120 

130 140 150 160 
SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

M M I I M I I M II M Mil : I I : I : I I : II 

SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 
130 140 ISO 160 170 

170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

Ml: : : : I 

KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 



45 The complete length ORF 132a nucleotide sequence <SEQ ID 869> is: 



50 



55 



60 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGAAACACA 
TGCCGCCATT 
AGATGTATCC 
TATGAAGGCT 
CGTTATCGGC 
TGAACCGTGG 
NTGCTGCACC 
GACCACCGCG 
CGGGCTTCNT 
CTGCCGCAAA 
CATTGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGT 
AGCAAAGCCT 
AAATTCGGCA 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGACACCGC 
AATGTCGCCA 
GCTGCCTTAT 
ATCATTGGNN 
TCTATGCTCG 
TATCGGCGGC 
CGCCGCGCCA 
GACGAATACG 
CCGTCCGCGT 
TCTTCGCCGA 
ACCGTGCCGT 
GCAAGACACT 
CGGAACACGG 



CGGTATCGGC 
CAGGGTTTGA 
ACCCAGCTCG 
GCAGTTGGAC 
AGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CGTGGGTTTT 
GTACCGGAAA 
AGACCCGAAC 
ACACCGCGTT 
ACCGCCGTGT 
TTTGGGCGCG 
CTGAAGGCCT 
TTGGACAAAG 
CTGGCAGGCC 



GGCACGTTTA 
ANTCAGCGGT 
AAGCCTTGGG 
GAATTTAAAG 
GGATGTGGTT 
CGCAATGGCT 
GCGGNGACGC 
GGAATATGCC 
ACTTCAGCGT 
AGCCAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
CAT CGTCTGC 
GCTGCTGGAC 
GGCGAAGCCA 



TGGGTGGGAT 
TGCGATGCGA 
CATAGGCGTG 
CCGACGTTTA 
GAAGCGATTT 
GGCTGAAAAC 
ACGGCAAAAC 
GGACTCGCAC 
TTCCGCCCGC 
CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACGGC 
GCCGGTGGAA 
ATGCCGATGG 
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801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CTCGTTCGAC 
GTTTGATGGG 
GCGCGTCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAACCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGCAAAGAC 
CAGGCGACCA 
ACCAAACTGC 



GTGTTGCTTG 
CGGACACAAC 
CCGGAGTOGA 
GTCAAACGCC 
CGACGACTTC 
TGCGCCAGCG 
TCCAATACGA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATGCCT 
TATTTTGGTG 
TGGACGCTTT 



ACGGCAAAAA 
CGCATGAACG 
CATTCAGACG 
GCATGGAAAT 
GCCCACCATC 
CGTCGGCGGC 
TGAAGCTGGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



AGCCGGACAC 
CGCTCGCNGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCTAT 
GCGCGCATCC 
TACGATGAAA 
TCTGNTACGC 
TTGGGCGGCA 
AATCGTGAAA 
GCGGTTTCGG 



GTCGCTTGGA 
CATCGCCGCC 
CCTTGAGCAC 
GCAAACGGTA 
CGAAACCACG 
TCGCCGTCCT 
GCCGCCCTGC 
CGGCGGCGCG 
GGCTGCACGT 
AACGCCGAAG 
CGGAATACAC 



This encodes a protein having amino acid sequence <SEQ ID 870>: 



MKHIHIIGIG GTFMGGIAAI 



51 YEGFDTAQLD 

101 XLHHHWXLGV 

151 LPQTPRQDPN 

201 HADIFADLGA 

2 51 KFGTEHGWQA 

301 ARHAGVDIQT 

351 IQGLRQRVGG 

4 01 DWDVAEALAP 

4 51 TKLLDALR* 



EFKADVYVIG 
AXTHGKTTTA 
SQSPFFVIEA 
IQTQFHHLVR 
GEANADGSFD 
ACEALSTFKN 
ARILAVLEPR 
LGGRLHVGKD 



_AKEAGFEXSG 
NVAKRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKKAGH 
VKRRMEIKGT 
SNTMKLGTMK 
FDAFVAEIVK 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFXIGG 
RSKFVHYRPR 
NGRQQSLQDT 
VAWSLMGGHN 
ANGITVYDDF 
AALPASLKEA 
NAEAGDHILV 



TQLEALGIGV 
ISGPQWLAEN 
VPENFSVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
DQVFXYAGGA 
MSNGGFGGIH 



ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orf 132a . oep MKH I H 1 1 G I GGT FMGG I AA I AKE AG FEXSGC DAKM Y P PM S TQLE ALG I G VYEG FDT AQLD 
I 1 I I I I I M I I I M I I : I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I 1 I I I : i I I I 
orf 132-1 MKH I H I IG IGGT FMGGLAAI AKEAG FEVSGCDAKMYPPMSTQLEALG IDVYEGFDAAQLD 



orf 132a . peo E FKADV YV I GN VAKRGMD WEAI LNRG LP Y I SG PQWLAENXLHHHWXLGVAXTHGKTTT A 

I I I I I I I I I 1 I I 1 I I II I I I II I I I I I I I I I M I I I : I I I M I I I I I I MINIM 
orf!32-l E FKADVYV I GN VAKRGMD WEAI LNLGLPY I SGPQWLSEN VLHHHWVLG VAGTHGKTTTA 



orf 132a. pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
M M II M M I I M M II II II M : I I M I I II M II M II II II II I II II M I II M 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 



orf 132a .r>eo RSKFVKYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
I II I I I I I I II I M M I I II M M M II I I I I M I I M M II I M I I M I II I I M II II 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 



orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
I I I II M II M I II M I I I II II I I I I M II II M I MM I M II I I II I M I M II 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 



orf 132a .pep. ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
i M M I M I I I I I I I : M I I II I I I I I M I M II I I I I I II II I I I I I I I I I II I I I I I I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETT IQGLRQRVGG 



orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
M I I M I M M M M I I I II M I I : M M I II II II II M I II II I I M I I I M M I II 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 



orf 132a. pep FDAFVAE I VKNAEAGDH I LVMSNGG FGG IHTKLLDALRX 

! I I It I I I I I I I I : I I I I ! II I I M ! I I I I M I M M I 
orf!32-l FDAFVAE I VKNAEVGDH I LVMSNGG FGG I HGKLLEALRX 



Homoloev with a predicted ORF from N. gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132. pep MKHIHIIGIGGTFMGGLAAI AKEAG FEVSGCDAKMYPPMSTQLEALG IDVYEGFDAAQLD 60 

I I I Ml I M M I I II I M M I I It II : I I M I I M II II I I M M I I I I : II I I II I I : 
orf!32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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amino 



orfl32 pep EFKADVWIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 

| | : j | : | [ | 1 M | : M ! M I I I M I I I I M II I I I I : I I I I I I II I I I I I II I I I I I ! I 
orfl32ng EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 120 

5 orf!32 pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 

I I I I II I I I I I I I t I I M I Illllhllll M I I I I I I M I I I! I I I M I I I 

orfl32ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

orfl32 pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

10 1:1111 M I I I M I M I I I I I I I I II I i I I I I I II : I : : I : i I I I I I I 

orf 132ng TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 240 

orf!32.pep FGQRLLDAGGKIRHGTRLA 2 59 

I I I I I I I I I I I I M MM 
15 orf!32ng FGQRLLDAGGKIRHRTRLADW 2 61 

An ORF132ng nucleotide sequence <SEQ ID 87 1> was predicted to encode a protein having 
acid sequence <SEQ ID 872>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

20 101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

2 51 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 

25 1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGAT 

51 TGCCGCCATT GCCAAAGAAG CCGGGTTCAA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CAT AG G C GT A 

151 CACGAAGGCT TCGATGCCGC GCAGTTGGAA GAATTTCAAG CCGATATTTA 

201 CGTCATCGGC AATGTCGCCA GGCGCGGGAT GGATGTGGTC GAGGCGATTT 

30 251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAac 

301 GTGCtgcacc atcaTTGGgt ACTCGGCGTG GcagggaCGC ACGGcaaAac 

351 gaccaCcGcg tCCATGCTCG CCTGGGTCTT GGAATATGCC GGACTCGCGC 

401 CGGGCTTCCT CATCGGCGGt gtaccggaAA ATTTCGGCGT TTCCGCCCGC 

4 51 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

35 501 CATCGAAGCC GACGAATACG ACACCGCCTT TTT CGAC AAA CGCTCCAAAT 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

40 7 51 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

45 1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

50 1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ED 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

55 51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HAD I FAD LG A IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

60 301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGG FGGIH 

4 51 TKLLDALR* ' 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 

orf 132ng-l .pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 

I II I I I I II I I II' I I I : I I I I M I I I : II I I H I I I I I I I I I I : I I I M M I : 

orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

or^l32ng-l.peo EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 

M : II : I I II I I I : I M II I I I I I I I I M I I I I I I I : I II I I I I I II I I I I I I I 

orfl32-l EFKADVYVIGNVAKRGMDWEAI LNLGLPYISGPQWLSENVLHKHWVLGVAGTHGKTTTA 

orf 132ng-l .pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
I I I I I I I I I I I I I M I I I II I I I I I M I I II I M I I I I I I 1 : I I I I I I I I I I M I I I M I 
orf 132-1 SMLAWVLE YAG LAPGFL I GGV PEN FGVSARLPQTPRQDPNSQSPFFV I EADEYDTAFFDK 

orf 132ng-l .pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 
I I I I I I II I I I I M I II I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I : I N I I 1 I 
orf 132-1 RSKFVHYRPRTAVLNNLE FDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf!32ng-l .pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 
I I I II I I I I I I I I I MM I I : I I II II I M II I I 11:1 Mill II II II I I I I I I 
orfl32-l LDKGCWT P VEK FG T E H G W Q AG E AN ADG S F D V L L DG KT AG R VKW D LMGR HN RMN ALAV I AA 

orf 132ng-l . pep ARHAGVDVQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
I M : 11 I M I I I I I I I I I II I I I I I I I I M I I I II I I I I I M M I I I I I 11 I M I I I I I I 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf I32ng-1 . pes ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
I M I II M I I M M I I M I I I I I I : I I I M II M M II I : II I I I I II II II M MM 
orfl32-l AR I LAV LE PRSNTMKLGTMKS ALPVS LKEADQVFC YAGGVDWDVAEALAPLGGRLNVGKD 

orf 132ng-l .pep FDT FVAE I VKN ARTG DK I L VMSN GG FGG I HTKLLDALRX 
! I : I I I I I I I I I :: I I ! I I I! I I M II I I I I It : I I M 
orfl32-l FDA FVAE IVKNAEVGDK I LVMSNGG FGG IHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 

pirMS56459 hypothetical protein o457 - Escherichia coli >gi | 537075 (U14003) 
ORF_o457 [Escherichia coli] >gi| 1790680 (AE000494) hypothetical 48.5 kD protein 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 (2%) 

Query: 22 KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 

++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
Sbjct : 21 RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 

Query: 82 AILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 

A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
Sbjct : 80 AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 

Query: 142 PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 

P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 

Sbjct: 140 PGNFEVSAHL GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLEFDH 190 

Query: 202 ADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 2 61 

ADIr DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 

Sbjct: 191 ADI FDDLKAIQKQFHHLVRIVPGQGRI IWPENDINLKQTMAMGCWSEQELVGEQGHWQAK 250 

Query: 2 62 EVNADGS-FDVLLDGKKAGHVAWDLMGGHNRMNALAVIAAARHAGVDVQTACEALGAFKN 320 

++ D S ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 
Sbjct: 251 KLTTDASEWEVLLDGEKVGEVKWSLVGEHNMHNGLMAIAAARHVGVAPADAANALGSFIN 310 

Query: 321 VKRRME IKGT ANG I TVYDD FAHH PT AI ETT I QGLRQRVGG- ARI LAVLE PRSNTMKLGTM 37 9 

+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 
Sbjct: 311 ARRRLELRGEANGVTVYDDFAHHPTAILATLAALRGKVGGTARIIAVLEPRSNTMKMGIC 370 

Query: 380 KSALPASLKEADQVF-CYAGGADWDVAEALAPLGCRLRVGKDFDTFVAEIVKNARTGDHI 438 

K L SL AD+VF W VAE D DT +VK A+ GDHI 

Sbjct: 371 KDDLAPSLGRADEVFLLQPAHIPWQVAEVAEACVQPAHWSGDVDTLADMWKTAQPGDHI 430 

Query: 439 LVMSNGG FGG IHTKLLDAL 4 57 
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LVMSNGGFGGIH KLLD L 
Sbjct: 4 31 LVMSNGGFGGIHQKLLDGL 4 49 

Based on this analysis, it was predicted that these proteins from ^meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

5 ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in KcolL Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 

10 experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 

1 CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

15 10 1 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG AT GAT AC ATT 

?0 351 AG G AT T AAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

4 01 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTT CTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

95 601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

7 0^ GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

7 51 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

30 851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

100 ] GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

35 HOI ATACGGCGGC AC AAG C AAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 . .PG1TGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

40 101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

2 51 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

45 351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

50 151 CCCGGTGCGT TTACACAGCA AG AT AAAAG C TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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10 



15 



20 



25 



30 



35 



40 



45 



301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 
TGTCGTCAAA 
GTTCGGCGAA 
AATACCTACG 
AGGTAATGCG 
CATCTGTCGG 
TACCGCGTGG 
TTTGGAACGG 
TCAATTCCGA 
AAATACAAGC 
CGAAGAGCAT 
TTACCCCCAT 
TTTAAATTGG 
CGATTTAAAC 
AGTTCAATTA 
GCAGCCTACA 
AGGCTGGGGG 
TCGACCTCAA 
CAAACCACTT 
CTTTCCTGAA 
GGCTTTATTC 
CAAAAATCAA 
CTACTTCGAT 
CCAATACCGT 
TCGGATGACG 
GAAACATTGC 
ACGGCAAAAA 
GGCGATTATT 
CAACATCCAA 
CCGCCTTAAA 
TATAAAAAAG 
CGGCTACCGC 
GGTGGGATTT 
GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCGGTGCATC 
GGCAGCTTCA 
TCTGCGGACT 
GCCTGCTGCT 
ATGGCGGCGA 
TGTGCTTTAC 
GCGGCGGCGG 
CGCAAGCAGC 
CAGCGGAAAA 
CGTATAAAAA 
GACAAAAGCT 
CGATCCGTCC 
AATACGACGG 
ACCAAAATCG 
CGGTTTGTCT 
ATTCGGGCAG 
CTTTTAAAGG 
CAACACCGCC 
TGGGCTTCAA 
GAATTGGGGC 
CTATTTGGGG 
CCATTGTCCA 
GCCGCGCTCA 
CGGCTACCGT 
AATTTAAGCG 
AACCGGAGCT 
GCGCGCCAAC 
TCATGCCGTT 
GAAATGTATT 
ACCAGAGCGC 
GATTGTTAAA 
AGCCGCATCG 
GAACGGGGAT 
TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
AATACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



TGTCGACAGC 
GCGGCTCGGC 
TTAGGCGTGG 
AAAAGGTCTG 
TAGGTGCGCG 
GGGCACAGCA 
GCAGCACATC 
GATATTTTGT 
TGGGAGCGGG 
TTACAACAAC 
GGCGGGAAAA 
AGCCTGAAGC 
CGTATTCAAT 
GCAGCCGCAA 
TTGAACCCGT 
GCAGAAATAT 
ATTTTGAAAC 
ACCTTCCGGC 
TTATTTCCAC 
TGTTTTTCGA 
CGGTTTAAGG 
ACCGGCCGGC 
AAAAAGACAT 
TTCGGCGGCG 
GGCATTCGGA 
GCGGGATTTA 
AACCATTCGG 
CGCCAGCTAT 
TTTCCCAAAT 
GCAAACACTT 
ACAAGATGAT 
ACAACTACAT 
ATTCCGAGCT 
CAATTTCAAA 
ACGATTATGG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGCTGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



AATTTTATTG 
AGGCATCAAC 
ATGACGTCGT 
ACCGGCACCA 
CAAATGGCTG 
GGCGCAGCGT 
GGAAATTTTG 
ACAAGAGGGT 
ATTTACAAAG 
CAAGAACTAC 
CCTg . CaCCG 
AGCAGTCGGC 
AAATACACGG 
AATCAT CAAC 
ATACCAACCT 
CCGAAAGGGT 
CTACAACAAC 
TGCCCCGCGA 
AACGAATACG 
CGGTCCTGAT 
GCGATAAAGG 
AGCCAATATT 
TTACCGCTTA 
AATATACGGG 
GAAAACTCGC 
TGAACCCGTA 
TCAGCATTAG 
TCGCGCACAC 
CGGCGACTCC 
GGCAATTTGG 
ACATTAGGAT 
CCACAACGTT 
GGGTCAGCAG 
GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
T G ATTTTTG A 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



CCGGACTGGA 
AGCCTTGCCG 
TCAGGGCAAT 
ATTCAACCAA 
GAAAGCGGAG 
GGCGCAAAAT 
GCGCGGAATA 
GCTTTGAAAT 
GCAACAGTGG 
AaAAATACAT 
CAATACGACA 
AGGCAATCTG 
CGCAATTTCG 
CGCAATTATC 
CAATCTGACC 
CGAAGTTTAC 
GCGAAAATCC 
AACCGAGTTG 
GCAAAAACCG 
CAGGACAACG 
GCTGCTGCCC 
TCAACACGTT 
AACTACAGCA 
CTATTACGGC 
CGACATACAA 
TTGAAAAAAT 
TGCGGACTTC 
ACCGTATGCC 
GGCGTTCACA 
CTTCAATACC 
TAAAACTGGT 
TACGGGAAAT 
CACCGGGCTT 
ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
AAGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCAAAAATCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



50 



55 



60 



65 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGKSIRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
VSALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQT FYS 
SLAGSANLRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGKWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDWQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYTG1TG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFSDAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 
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Homologv with with the probable TonB-dependent receptor HI121 of H.influenzae (accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 

I E D +L K G K+A NHS ++SA+ DYFMPF + YSRTHRMPN I QEM+ FSQ+ ++GV+TA 
INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 
LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 



10 



15 



30 



Orf 133 : 


31 


HI121 : 


563 


Orf 133 : 


91 


HI121 : 


623 


Orf 133 : 


151 


HI 12 1 : 


681 


Orfl33: 


211 


HI121: 


741 


Orfl33: 


271 


HI121: 


801 


Orfl33: 


331 


HI121: 


860 


Orfl33: 


391 


HI121: 


911 



S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 

ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYID 
AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 
ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLAARYYGKSKRAT IEEE YIN 

GTNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 
20 " G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 



LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 
25 HI121: 860 LDAGNDAASQRYYSSL NNSIECAQDSSAC GGSDKTVLYNFARGRTYILSLN 910 



YKF 



Homology with a predicted ORF from N. meningitidis (strain A) 
ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of TV. 

meningitidis: 

10 20 30 

35 orf!33 pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

11 I I I 1 I I I I II I I I I I I 1111:1111 
orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 . 490 500 

40 40 50 60 70 80 90 

orf 133 . pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I | 1 1 M I 1 I 1 1 II I I I I I I I 1 I I I I I I I I I I M I II 1 I i I I I I I I i I M II M I I I I I i I 
orf 133a YE PVLKKYGKKRANNHSVS I SADFGDYFMP FAS YSRTHRMPN I QEMYFSQIGDSGVHTAL . 

510 520 ■ 530 540 550 560 

45 

100 110 120 130 140 150 

orf 133. pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
I I I I II II I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I II II I I I I : It I I I I 
orf 133a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
50 570 580 590 600 610 620 

160 170 180 190 200 210 

orf 133 . pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFS DASESPNNA 
I t [ I I I [ I I I I 1 Mil: 111 I I I I I I I I II I II I I I I M I I I I M I I I I 

55 orf 133a STG LAY T I QHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFS DASESPNNA 

630 640 650 660 670 680 

220 230 240 250 260 270 

or f 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 
60 M i I M I I I I I I I II I I I I ! I I I I I I II I I I I I II I I I I I I I M I I I I I I M M I I I M 

orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 

280 290 300 310 320 330 

65 orf 133 . pep TNGGNTSNFRQLGKRSIKQTETL^QPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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II! I I I ! I I I I II I ! I I II I II I I I I i I I I I M I I I I I II M I I i I ! M M I I 
orf!33a TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 
750 760 770 780 790 800 

340 350 360 370 380 3*0 

orf 133 . pep DAGN DAAXE R YYS S FD PKDKDXDVT CNADKT LCNGK YGGT S KS VLTN FARGRT FLMTMS Y 
I I I II II :: I I II I I I I I II I : I I I I I : I I I II II I I I II I I I I I I I I I I I I : I I M 
orf 133a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 - 820 830 840 850 860 



orfl33.pep KFX 
I i I 

orfl33a KFX 
870 



A partial ORF133a nucleotide sequence <SEQ ID 879> is: 



1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC AC AN C AAN AT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

2 51 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 
301 GTCAAAGGCA GCTTCAG CGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

3 51 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

4 01 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 

4 51 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 
501 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

5 51 GCGTGGGCGG CGGCGGGCAG C AC AT C G G AA ATTTTGGCGC GGAATATCTG 
601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 
651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACTGGAAAA 
701 CCAAGTGGTA TCAAAAATAC GATGCCCCCC AAGAACTGCA AAAAT AC AT C 
7 51 GAAGGTCATG AT AAAAGCT G GCGGGAAAAC CTGGCGCCGC AATACGACAT 
801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 
851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 
901 GATTTAAACA CCAAAATCGG CAGCCGCAAA AT CAT C AA C C GCAATTATCA 
951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAATATC CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC T AC AA C AAC G CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTT.CCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC AN GAC AACGG 

12 51 GCTTTATTCC TATTTGGGGC GGTTTAAGGG' CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGACATT TACCGCTTAA ACTACAGCAC 

14 01 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

14 51 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC G AC AT AC AN G 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

1551 CGG C AAAAAG CGCGCCAACA ACCATTCGGT C AG CAT TAG T GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

17 01 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

17 51 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

22 51 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 AC AAACC G AA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

24 01 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

24 51 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2 601 GAGCTACAAG TTTTAA 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



10 



15 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



KDKKVFTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 
F* 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



20 ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



25 



30 



35 



40 



45 



50 



55 



60 



65 



10 20 30 40 

orf!33a peo KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 

I I I I I I M I I 1 I i I I I I I I I I I II I I M I I II I M I I I 
orf 133-1 EAQIQVLEDVHVKAKRVPKDKPCVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 

10 20 30 40 50 60 

50 60 70 80 90 100 

orf 133a oep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 

|| Ml! I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I M I I I I I I I t I I 

orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 

70 80 90 100 110 120 

110 120 130 140 150 160 

orf 133a pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
I | 1 | | | I | I I I I I II I I I I II MINIM M I M M M M M I 1 I M M I M M M I I 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 

170 180 190 200 210 220 

orf 133a oeo ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 

i M M I M M M i I M M M I II M M M M I M M M I M M : 1 M I I : M I 

orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 

190 200 210 220 230 240 

230 240 250 260 270 280 

orf 133a pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 

MUM:: I I I I : : I : I I M M I I MINIM II II II I II II 

or^l33-i WERDLQRQQWKYKPYKNYNN-OELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 

290 300 310 320 330 340 

orf 133a Deo LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

I | | | | N I II I I I I I II I I I M N M I N I I I I I M I I I I I I I I II II I M I I I 

orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
300 310 320 330 340 350 

350 360 370 380 390 400 

orf 133a pep yPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 
Mllllllllll I M I I I I M II M I N : I I N I N I I I I I I I I N II I II N M I N 
orf 133-1 ypKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 
360 370 380 390 400 410 

410 420 430 440 450 460 

orf 133a pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 

IN! N I I N I II I M I II N I I II I I I N I N I I I I I II I I I I I II I I N I II 

orf 133-1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 
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20 



25 



30 



35 



40 



orfl33a .pep 
orfl33-l 



orf 133a . pep 
orfl33-l 



orf 133a. pep 
orfl33-l 

orf 133a .pep 
orfl33-l 



orf 133a . pep 
orf 133-1 

orf 133a . pep 
orfl33-l 

orf 133a .pep 
orfl33-l 



470 480 490 500 510 520 

LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVLKKYGKBCRA 
I I I I I I I I I II I I I I I I I I I I I I I I I I i I I I I M I I I I I : I I I I I I I I I I I II I I I I 
LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 580 

NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 

I I 1 I M I i I I I I I I I I I I I I I M I U M I I I I I I M I I I I I I I II I I I I I I I I I I I M I I 
NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 

I I I I I I I I I 1 I I I I I I I I I I I I I I I M I I I II M I I I I : I I I I I I I I M I I I I If I I ! 
TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

650 660 670 680 690 700 

KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
I I I I I I I I I I I I I I I II I II I I I 1 i I I M I I I I I 1 I II I I I I I I I I I I it I I M I M I 1 
KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFS DASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 

710 720 730 740 750 760 

RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
I I I I I I I I ] I I I I I I I I I I I I I I I I M I I II I I I I I I I II I I I I I I Ml MINIM 
RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
i M I I I I M M I I I I I I I I M II I I II I I M I I I II I I II II I II I i I I II M I II I 
KRS I KQTETLARQPLI FDFYAAYEPKKNLI FRAEVKNLFDRRYI DPLDAGNDAATQRYYS 
780 790 800 810 820 830 

830 840 850 860 870 

SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
I I I I I I I I I : I I I I I : I II I I I I I I II I M I II II 11 I i I : I I I I M I 
SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 



45 



50 



55 



60 



65 



Homology with a predicted ORF from N. gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from N. 
gonorrhoeae: 

orf 133. pep 

orf 133ng 

orf 133 .pep 

orf 133ng 

orfl33.pep 

orf 133ng 

orf 133. pep 

orf 133ng 

orf 133 .pep 

orf 133ng 

orf 133 . pep 

orfl33ng 



PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 
I I II I : :\ I II I I II I I I : MM: MM 
FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
M i II II M II II II M I 1 M M M II M I M : M I I I I i ! I I M M I I M I M I ! M I I 
YEPVLKKYGKKRANNHSVS1SADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 

KPERANTWQFGFXTYKKGLLKODDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
II I I M II I I II M I I I I I M II I I I I II I I I M II I I I I II II I I I I II M M II M 
KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 



31 



560 



91 



620 



151 



680 



STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 
M I M II I: II I MM: I II II M M M 1 I I I I I I M I II I I II II II II 

STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 74 0 



SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 

I I I i II M I I I M M I I M II I I M I I M II I M II II I I I I 1 I I M I M I I I I I I I I M 
SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 

II 1 I I M I M I II I I M I M I II M I II M I M I M I I I I M M I M II M M M II 
TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 



271 



800 



331 



860 
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orf!33 oep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

* H II : : 1 | | | | | | ! I I I 1 I I I I I II I I I M M I II I I I! I I I I M M I I I 

orfl33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 

5 

orfl33.pep KF 393 
I I 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode a 
10 protein having amino acid sequence <SEQ ID 882>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 VVKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

IS 201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

2 51 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLLNLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

4 01 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

20 451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FY FDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

25 7 01 G r ELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

7 51 GLSRVSALPR DYGRLEVGTR WLGN KLTLGG AMRYFGKS IR ATAEERYIDG 
801 TNGGNTSNVR QLGKRSIKQT ETLARQPLI F DFYAAYEPKK NLIFRAEVKN 

8 51 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 
901 SKSVLTNFAR GRTFLMTMSY KF* 

30 A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ED 883>: 

1 ATGAGATCTT CTTTCCGGTT GAAGCCGATT TGTTTTTATC TTATGGGTGT 

51 TATGCTATAT CATCATAGTT ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 

101 AGGCGCAGAT ACAGGTTTTG GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 

151 CCGAAAGACA AAAAAGTGTT TACCGATGCG CGTGCCGTAT CGACCCGTca 

35 201 aGATGTGTTC AAATCCGGCG AAAACCTCGA CAACATCGTA CGCAGCATAC 

251 CCGGTGCGTT TACACAGCAA GATAAAAGCT CGGGCATTGT GTCTTTGAAT 

30 ^ ATTCGCGGCG ACAGCGGGTT CGGGCGGGTC AATACGATGG TGGACGGCAT 

3^1 CACGCAGACC TTTTATTCGA CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 

4 01 CATCTCAATT CGGTGCATCT GTCGACAGCA ATTTTATTGC CGGACTGGAT 

40 4 51 GTCGTCAAAG GCAGCTTCAG CGGCTCGGCA GGCATCAACA GCCTTGCCGG 

501 TTCGGCGAAT CTGCGGACTT TAGGCGTGGA TGACGTCGTT CAGGGCAATA 

551 ATACCTACGG CCTGCTGCTA AAAGGTCTGA CCGGCACCAA TTCAACCAAA 

601 GGTAATGCGA TGGCGGCGAT AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 

651 GTCTGTCGGT GTGCTTTACG GGCACAGCAG GCGCGGCGTG GCGCAAAATT 

45 701 ACCGCGTGGG CGGCGGCGGG CAGCACATCG GAAATTTTGG TGAAGAATAT 

7 51 CTGGAACGGC GCAAACAGCA ATATTTTGTA CAAGAGGGTG GTTTGAAATT 

801 CAATGCCGGC AGCGGAAAAT GGGAACGGGA TTTGCAAAGG CAATACTGGA 

851 AAACAAAGTG GTATAAAAAA TACGAAGACC CCCAAGAACT GCAAAAATAC 

901 ATCGAAGAGC AT G AT AAAAG CTGGCGGGAA AACCTGGCGC CGCAATACGA 

50 951 CATCACCCCC ATCGATCCGT CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 

1001 TGTTTAAATT GGAATACGAC GGCGTATTCA ATAAATACAC GGCGCAATTT 

1051 CGCGATTTAA ACACCAGAAT CGGCAGCCGC AAAATCAT C A ACCGCAATTA 

1101 TCAATTCAAT TACGGTTTGT CTTTGAACCC GTATACCAAC CTCAATCTGA 

1151 CCGCAGCCTA CAATTCGGGC AGGCAGAAAT ATCCGAAAGG GGCG AAGTTT 

55 1201 ACAGGCTGGG GGCTTTTAAA AGATTTTGAA ACCTACAACA ACGCGAAAAT 

1251 CCTCGACCTC AACAACACCG CCACCTTCCG GCTGCCCCGC GAAACCGAGT 

1301 TGCAAACCAC TTTGGGCTTC AATTATTTCC ACAACGAATA CGGCAAAAAC 

1351 CGCTTTCCTG AAGAATTGGG GCTGTTTTTC GACGGTCCTG ATCAGGACAA 

14 01 CGGGCTTTAT TCCTATTTGG GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 

60 14 51 CTCAAAAATC AACCATTGTC CAACCGGCCG GCAGCCAATA TTTCAACACG 

1501 TTCTACTTCG ATGCCGCGCT CAAAAAAGAC ATTTACCGCT TAAACTACAG 

1551 CACCAATGCA ATCAACTACC GTTTCGGCGG CGAATATACG GGCTATTACG 

1601 GCTCGGAAAA CGAATTTAAG CGGGCATTCG GAGAAAACTC GCCGGCATAC 

1651 AAGGAACATT GCGACCCGAG CTGCGGGCTT TATGAACCCG TATTGAAAAA 

65 17 01 ATACGGCAAA AAGCGCGCCA ACAACCATTC GGTCAGCATT AGTGCGGACT 

1751 TCGGCGATTA TTTCATGCCG TTCGCCGGCT ATTCGCGCAC ACACCGTATG 
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1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 



CCCAACATCC 
CACCGCCTTA 
CCTATAAAAA 
GTCGGCTACC 
ATGGTGGGAT 
TTGCCTACAC 
GGTTTTGAGC 
TTCTTACGCC 
GCGAATCGCC 
GGGCTGAGCA 
CGGTACGCGC 
ATTTCGGCAA 
ACCAACGGGG 
CAAACAAACC 
CCGCTTACGA 
CTGTTCGACA 
AACGCAGCGT 
TAACGTGTAA 
AGCAAAAGCG 
GATGAGCTAC 



AAGAAATGTA 
AAACCAGAGC 
AGGATTGTTA 
GCAGCCGCAT 
TTGAACGGGG 
CATCCGACAC 
TGGAGCTGAA 
TATCAAAAAA 
CAACAATGCC 
GGGTTTCCGC 
TGGTTGGGCA 
GAGCATCCGC 
GAAATACCAG 
GAAACCCTTG 
GCCGAAGAAA 
GGCGTTATAT 
TATTACAGCT 
TGCTGATAAA 
TATTGACCAA 
AAGTTTTAA 



TTTTTCCCAA 
GCGCAAACAC 
AAACAAGATG 
TGACAACTAC 
ATATTCCGAG 
CGCAATTTCA 
TTACGATTAT 
GCACGCAACC 
tccaaAGAAG 
CCTGCCGCGA 
ACAAACTGAC 
GCGACGGCTG 
CAATGTCCGG 
CCCGACAGCC 
AACCTTATTT 
CGATCCGCTC 
CGTTCGACCC 
ACGTTGTGCA 
TTTCGCACGC 



ATCGGCGACT 
TTGGCAATTT 
ATATATTAGG 
ATCCACAACG 
CTGGGTCGGC 
AAGACAAAGT 
GGGCGTTTTT 
GACCAATTTC 
ACCAACTCAA 
GATTACGGAC 
TTTGGGCGGC 
AAGAACGCTA 
CAACTGGGCA 
TTTGATTTTT 
TCCGCGCCGA 
GATGCGGGCA 
GAAAGACAAG 
ACGGCAAATA 
GGACGCACCT 



CCGGCGTTCA 
GGCTTCAATA 
ATTGAAACTG 
TTTACGGGAA 
AGCACCGGGC 
GCACAAACAC 
TCACCAACCT 
AGCGATGCGA 
ACAAGGTTAT 
GTTTGGAAGT 
GCGAtgcGCT 
TAT CGACGGC 
AGCGTTCCAT 
GATTTTTACG 
AGTCAAAAAC 
ATGATGCGGC 
GACGAAGACG 
CGGCGGCACA 
TCTTGATGAC 



This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR 



PKDKKVFT DA 
IRGDSGFGRV 
VVKGSFSGSA 
GNAMAAIGAR 
LERRKQQYFV 
IEEHDKSWRE 
RDLNTRIGSR 
TGWGLLKDFE 
RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNTSNVR 
LFDRRYIDPL 
SKSVLTNFAR 



RAVSTRQDVF 
NTMVDGITQT 
GINSLAGSAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YE PVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRT FLMTMSY 



KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 
ETLARQPLIF 
YYSSFDPKDK 
KF* 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLFKLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 
DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGENS PAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 



10 20 30 40 50 60 

orf 133ng-l . pep SFRLKFICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I I I I I I I i I I I I M i I I I I 1 I I I I I I I I I I 
orfl33-l EAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

10 20 30 



70 80 90 100 110 120 

orf 133ng-l . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
I 1 I I I : I M : II I I M I I I M I I i I I I I I I I M I I I M M I I I I I M I I I I I M I I I I I I 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 



130 140 150 160 170 180 

orf 133ng-l . pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I i I I I I II I I I I I I I I I I I I I II I II 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

100 110 120 130 140 150 



190 200 210 220 230 240 

orf 133ng-l . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 
II I I I II I I I I II I I I I M I I I I II 11 I I 11 I I I I I I I I I I I I I ! : M 1 I I I I I 1 I ! I I I 
orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 

250 260 270 280 290 300 

orf 133ng-l .pep' GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
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I I I I I I I I ! I I I : I I I II I : I I I I : I M M M I I I I II I I I : I : : MINIMI 
orf 13 3-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 

310 320 330 340 350 360 

orfl33ng-l.pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 

I I M I M II I I I I M I II I I : II I I I I I I M II I II II I I I M II : I I M M I 

or^l33-l HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 



370 380 390 400 410 420 

orfl33ng-l.pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 

Mill I M I I I I I I I I M M I I 1 M M I : II I II II I I I I I I M II II M II I I 

orf 133-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
15 " 330 340 350 360 370 380 

430 440 450 4 60 470 480 

or*133ng-l.pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
I | | | | M M M M I M M M M II 11 II I I I I I II M I I M II M M I 11 M I 1 II M II 
20 orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

390 400 410 420 430 440 

490 500 510 520 530 540 

o-f i 33ng-l . pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
25 M II I M I I II M I M I I I I II II M I I I I I I I I I I : : : M I I I I I M M M : : M M M 

orf 133-1 pQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 

30 or^ 33ng-l .peo GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 

I M I |: I |: I I : I I j : ! I M M I I II I M I M I I I I I M II M M I I I : M II M I I II 
orf 133-1 GENSPTYKKHCNRSCGIYE PVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 

510 520 530 540 550 560 

35 610 620 630 640 650 660 

orf 33ng-l . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 

MM 11 I I I I M I II I M II I I I M I I I I II II II M I I I I I I I I M II II I I 

orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

40 

670 680 690 700 710 720 

0^fl33ng-l .oep VYGKWWDLNGDIPSWVGSTGLAYTIRHRN FKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
M I I II M I I I II I I I : I I M I II I : M I I I I I M I I I M I II II 1 I I II II I M M M I 
orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
45 630 640 650 660 670 680 

730 . 740 750 760 770 780 

orf 133ng-l . pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
M II II I I I M I I I I I I I I I I M I ! II I I II I I I I I I I II I II I II M I M I M II I M I 
50 orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

690 700 710 720 730 740 

790 800 810 ' 820 830 840 

orf 133ng-l .pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
55 " M II M I I M II I I M I I! II II M I I I 1 I I I 11 M M I M M I I I I I II II I M M I I 

orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 

850 860 870 880 890 900 

60 orf 133ng-l .pep FRAEVKNLFDRRYIDPLDAGN DAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

1 I I II M M M I I I II I I I II I I II 1 I I I I I I i I 1 II I II I I I I I I M I II M II 

orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 

65 910 920 

orfl33ng-l.pep VLTNFARGRT FLMTMSYKFX 
I M II II I I I II I M M I I I 
orf!33-l VLTN FARGRT FLMTMS YKFX 

870 880 

70 In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H.influenzae: 
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sp|P45114 IYC17 HAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
>aill075372jpir| IG64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae {strain Rd KW20) >gi|1574147 (U32801) transferrin binding 
urotein 1 precursor (tbpl )' [Haemophilus influenzae] Length - 913 
'Score = 930 bits (2377), Expect = 0.0 ^„ 
Identities - 476/921 (51%), Positives - 619/921 (66%), Gaps = 72/921 (7%) 
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Query : 
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38 QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D ++RS I PGAFTQQDK SG+V 
29 ETLGQIDWEKVISNDKKPFTEAKAKSTRENVFKETQTIDQVIRSIPGAFTQQDKGSGW 88 

98 SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDVVKGSFS 157 

S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
8 9 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 14 8 

158 GSAGINSLAGSANLRTLGVDDWQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 

14 9 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
209 YVGWYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

278 LQRQYWK TKWY ■ KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK IEE 

2 66 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI + P L+ + S +L K EY AQ R L+ +IGSRKI 

32 6 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

3 64 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

38 5 NRNYQVNYNFNNNSYLDLNLMAAHNIGKTIYPKGGFFAGWQVADKLITKNVANIVDINNS 44 4 

4 2 4 AT FRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY — LGRFKGDKG 4 81 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 

4 45 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

4 82 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I + QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
505 LLPQRSVILQPSGKQKFKTVYFDTALSKGI YHLNYSVNFTHYAFNGEYVGY 555 

54-? AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

556 ENTAGQQ INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 
605 NI QEM FFSQV SN AGVNT ALKPEQS DT YQLG FNT YKKGLFTQDDVLGVKLVGYRS FIKNY I 664 

662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRN FKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

665 HNVYGVWW — RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 
Q ++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 

723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

782 MRYFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ 

783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 
LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
: 842 L 1 1 KAE VQN L L DKR YVD P L D AGN DAAS QR YY SSL NNSIECAQDSSAC GGSD 892 



Querv: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL N FARGRT++++++YKF 
Sbjct: 893 KTVLYN FARGRTY I LSLNYKF 913 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

5 Example 104 

The following partial DNA sequence was identified in ^meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CAT CATC CGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

in 151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGT CATC AAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AAC AT CAAAG 

15 4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA AC AG CGT GAT CAATGT GCGC GAAATGTTGC CCGACCAT . . 

This corresponds to the amino acid sequence <SEQ ID 886; ORF112>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

20 101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH. . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 



25 



30 



35 



40 

This corresponds to the amino acid sequence <SEQ ID 888; ORF1 12-1>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

45 51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF~~AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW RK LVY P AAAW VMAL V A FA F T PQTTRHGNMG 

50 301 LKLFGGICXG LLFHL AGRLF GFTSQL... 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 



1 


ATGAACCTGA 


51 


TTACGCGCTC 


101 


ACGAAACCGG 


151 


gGCTACACCG 


201 


CGCCGTCCTT 


251 


GCGAACTGAC 


301 


TTGATTCTGT 


351 


CGGCGAATGG 


401 


CCGCCGCCAT 


451 


AAAGAAAAAA 


501 


GCTTTTGGGC 


551 


AGGCAGTGGA 


601 


TTGAAAAACA 


651 


TATTGCGGCT 


701 


ACGTATTGCT 


751 


TACATCCGCC 


801 


CGCATGGTGG 


851 


TCGTCGCCTT 


901 


TTAAAACTCT 


951 


ACGGCTCTTT 
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Homologv with a predicted ORF from N. meningitidi s (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A ofM 



meningitidis: 



10 



15 



20 



orf 112 .pep 
orf 112a 



orf!12.pep 
orf 112a 



orf 112 .pep 
orf 112a 



10 20 30 40 50 60 

MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

| | | | | M 1 I I M I I I I I I t I I I I I I I I I M I I I I I I M I M II I I I I II I I I I I I I II 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

I I I I : M I It I I I I I I I M I I It I I : I I I I I I I I M I I I I I I I I I I M II I I I I I I I I I 
AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
70 80 90 100 110 120 

130 140 150 160 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I M M M 1 i I I I I II I I I I I I I I II t i I I I II I M : I I I I I I I I I I 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
130 140 150 160 170 180 



ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 



orf 112a 

The ORF1 12a nucleotide sequence <SEQ ID 889> is 
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30 



35 



40 



45 



1 


ATGAACCTGA 


51 


TTACGCGCTC 


101 


ACGAAACCGG 


151 


GGNTACACCG 


201 


CGCCGTCCTT 


251 


GCGAACTGAK 


301 


TTGATTCTGT 


351 


CGGCGAATGG 


401 


CCGCGGCCAT 


451 


AAAGAAAAAA 


501 


CCTGCTGGGC 


551 


AGGCAGTGGA 


601 


TTGAAAAACA 


651 


TATTGCGGCT 


701 


ACGTATTGCT 


751 


TACATCCGCC 


801 


CGCATGGTGG 


851 


TCGTCGCCTT 


901 


TTAAAANTCT 


951 


NCGGCTCTTC 


1001 


NCGGCGCACT 


1051 


CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CGCAGTTCGG 
GTTGCGCCCA 
CA^CGGCAAA 
ACAGCATTAT 
ATTAAAATCT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAAANT 
CGTCAAACCC 
ACCTCCAAAN 
CGCAAATTGG 
TGCCTTTACC 
TCGGCGG CAT 
NGGTTTACCA 
ACCTACCATA 
AAAAACGCTA 



CATCATCCGT 
TCGCTTTGTA 
AAAGGCAGTT 
GNCCGCCCGC 
TGGTCTCTNT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
ATCAGTACCG 
CAATGTGCGC 
GGGCCCGCAA 
GCCGTTTTGA 
CACGCTTGGC 
GGCCGATTTC 
GACCAAATGT 
NNACAGCCAA 
TTTACCCCGC 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
GAAGACAAAG 
CGTCAAACGC 
CCGTCGGCGA 
AACACCCGAA 
CGCAGCCTGG 
CCCGCCACGG 
TTGCTGTTCC 
CGGCATCCCG 
TGCTCGCCGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGNTG 
TGATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCATAC 
GAACTGGCAG 
CAGTTGGCAG 
TCGAGGTCTC 
AACCTGATGG 
ACTGACCACC 
TCTACGCCAT 
GTGATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having the amino acid sequence <SEQ ID 890>: 



50 



55 



l 

51 
101 
151 
201 
251 
301 
351 



MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 



GYTALKMXAR AYELMPLAVL 



LILSQFGFIF AIATV ALGEW 
KEKNSIINVR EMLPDHTLLG 
LKNIRRSTLG EDKVEVSIAA 
YIRHLQXXSQ NTRIYAIAWW 
LKXFGGICLG LLFHL AGRLF 
RKQEKR- 



IGGLVSXSQ L AAGSELXVIK 
VAPTLSQKAE NIKAAAINGK 
IKIWARNDKN ELAEAVEADS 
EEXWPISVKR NLMDVLLVKP 
RK LVY PAAAW VMALVAFAF T 
XFTSQLYGIP PFLXGALPTI 



ASGMSTKKLL 
ISTGNTGLWL 
AVLNSDGSWQ 
DQMSVGELTT 
PQTTRHGNMG 
AFALLAVWLI 



60 



ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 

drf 112a . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 
I I I I I It I I I If M I M I I I I I II I I I I I M II M 1 I I I I 1 I I I I I I 1 I I I I II II II 
orf 112-1 MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

orf 112a . pep AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 



BNSDOCID: <WO 992457BA2_I_> 



10 



15 



20 



WO 99/24578 

orfll2-l 
orfll2a.pep 
orf!12-l 
orf 112a. pep 
orfll2-l 
orf 112a .pep 
orfll2-l 
orf 112a. pep 
orfl!2-l 
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I | | | : | M I I II I I I I I I I I I H I I : I I I I I I I I I I I I I I I I I ! M II I I t I I I M M I 
AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSI1NVREMLPDHTLLGIKIWARNDKN 

| | M I i 1 I I I II i i I II I I I I I I I I I I I I I M i II I I I I I I I I I M I I I I I I I I I M M 
VAPTLSQKMNIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

ELAEAVEADSAVLNSDGSWQLBCNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 

| | | | M | | I I I I I I I I I II II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M M M I! 
ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVS IAAEENWPISVKRNLMDVLLVKP 

DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFT PQTTRHGNMG 

INN I i I I I I I I I I I I I I M I I I I I M I I M I I III I I U M II 1 I 

DQMSVGE LTTY I RHLQNN SQNTRIYA I AWW RKLVY PAAAWVMALV AFAFT PQTTRHGNMG 

LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II Mill I I 1 1 I I I I I I I Mill 
LKLFGGICXGLLFHLAGRLFGFTSQL 



Homology with a predicted ORF from N. gonorrhoeae 

ORF1 12 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF1 12ng) from N. 



25 



30 



gonorrhoeae: 

orf 112 . pep 
orf 112ng 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
E I 1 I I I 1 I I I I I I I M M M I I I I I I I I I M I II I M I I I I I I I I M I I I I I I II I I I M 
MNLISRYI IRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 



60 



60 



orf 111 



orf 112ng 
orf 112 . pep 
orf 112ng 



A V ELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 
1 I I I : I I I I M I I I : I I I I M 1 I I M : I I I M I I I I I I 1 I M I I I I I i I 1 I I I : M I I 1 I 
AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 



VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I I I I ! I I II I I I M I I I M I II I I I I I I I I M I : I : I I M Mill 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 



35 



40 



45 



50 



55 



OIL 1 IZiiy vrtr- i ijjvi^ 111 ^.luwii-nui^o o.^-^.. ^.-v^^v^ 

The complete length ORF1 12ng nucleotide sequence <SEQ ID 891> is 



166 



180 



2 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGC AAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CTCAGTTCGG 
GTTGCGCCCA 
taacggCAAA 
ccAGCATTAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAACTT 
CGTCAAGCCC 
ACCTCCAAAA 
CGTAAACTCG 
CGCCTTTACG 
TCGGCGGCAT 
GGGTTTACCA 
GCCTACCATA 
AAAAACGTTG 



CATCATCCGC 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGCCTCTCT 
GCCAGCGGCA 
TTTTATTTTT 
CGCTGAGCCA 
ATCAGCAccg 
CAATGTGcGc 
GGGCGCGCAA 
GCCGTTTTGA 
CATCATGGGT 
gGCCGATTGC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGT 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 
A 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
T G AG C AC C AA 
GCTATTGCCG 
AAAAGCCGAA 
gcAATACCGG 
GGAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
ACAGACAAAA 
CGTCAGACGC 
CCGTCGGCGA 
AAC AC C C AAA 
CGCCGCATGG 
CGCGCCACGG 
TTGCTGTTCC 
CGGCACCCCA 
TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



60 



This encodes a protein having amino acid sequence <SEQ ID 892>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

' 201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGE LTT 

2 51 Y I RHLQNN SQ NTQIYAIAWW RK LVYPVAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICLG LLFHLAGRLF GFTSQLYGTP PF LAGALPTI AFALLAVWLI 
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351 RKQEKR* 

ORF1 12ng and ORP1 12-1 show 94.2% identity in 326 aa overlap: 

10 20 30 40 50 60 

MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
II I It I I I I I I I M I I I I I 1 I I I I I M I I I M I I I I I I I I I I I I I II I I I I M I I I I I I I 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
10 20 30 40 50 60 



orf 112ng 
orfl!2-l 



10 



15 



20 



25 



30 



35 



70 80 90 100 110 120 

orf 112ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSOFGFIFAIAAVALGEW 
I I I I : II I I I I M 1 : I I I N I I I I I I : I I I I I I I I I II I I I M II I I I I! II I : I I I ! i I 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 112ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 
I 1 I I I 11 I I I I I I I I M I I I I i I I I I I I II I II : I I II I II I I I I I I I I I I I I I I I I I 
•orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 112ng ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
I I I I I I I I I I I I I I I II II I I I I I I I I : I I I : 1 : I M I I : II I : I : I I M I I I I I I I 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

190 200 210 220 * 230 240 

250 260 270 280 290 300 

orf 112ng DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVT^AWVMALVAFAFTPQTTRHGNMG 

I I I I I II I M I I I I II I I I I I I : I I I I I I I I I I II I : I I I I I I I I I I I I I I I I II I M II 
orfll2-l DQMSVGELTTYIRHLQNNSQNTRI YAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 112ng LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 

II I I I I II I I I I I I I I I I I II I I II 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



40 



This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PCR primers 



ORF 



ORF1 



ORF 2 



ORF 2-1 



ORF 4 



ORF 5 



ORF 6 



ORF 7 



ORF 8 



ORF 9 



ORF 10 



ORF 11 



ORF 13 



ORF 15 



ORF 17 



Primer 



Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Reverse 

Forward 
Forward 
Reverse 

Forward 



Sequence 



CGCGGATCCGCTAGC-GGACACACTTATTTCGG 
CCCG CTCGAG -CCAGCGGTAGCCTAATT 

GC GGATCCCATATG -TTTGATTTCGGTTTGGG 
CCCGCTCGAG-GACGGCATAACGGCG 



GC GGATCCCATATG -TTTGATTTCGGTTTGGG 
CCCGCTCGAG-TGATTTACGGACGCGCA 



GCGGATCCCATATG- TGCGGAGGTCAAAAAGAC 
CCCG CTCGAG -TTTGGCTGCGCCTTC 

GGAATTC CATATG G CCATGG -TGGAAGGCGCACAACC 

CG GGATCC -ATGGAAGGCGCACAAC 

CCCGCTCGAG-GACTGTGCAAAAACGG 



CGC GGATCCCATATG -ACCCGTCAATCTCTGCA 
CCCG CTCGAG -TGCGCCGAACACTTTC 

CGC GGATCCGCTAGC -GCGCTGCTTTTTGTTCC 
CCCG CTCGAG -TTTCAAAATATATTTGCGGA 

GC GGATCCCATATG -GCTCAACTGCTTCGTAC 
CCCG CTCGAG -AGCAGGCTTTGGCGC 

CGC GGATCCCATATG -CCGAAGGAAGTCGGAAA 
CCCG CTCGAG -TTTCCGAGGTTTTCGGG 

GC GGATCCCATATG -GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 

GCGGATCCCATATG - GCG GTCAAC CT C TACG 
CCCG CTCGAG -GGAAACGACTTCGCC 

CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 

GG AAT T CCATATG G CCATGG - G C G G G AC AC T G AC AG 

CG GGATCC -TGCGGGACACTGACAGG 

CCCGCTCGAG-AGGTTGGCCTTGTCTATG 

GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 



Restriction sites 



BamHI-Nhel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

Ndel-Ncol 

BamHI 

Xhol 

BamHI-Ndel 
Xhol 

BamHI-Nhel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

Ndel-Ncol 

BamHI 

Xhol 

Ndel-Ncol 
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Forward 
Reverse 


CGGGATCC-ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


BamHI-Ndel 
Xhol 


ORF19 


Forward 
Forward 
Reverse 


GG AATTCCAT AT GGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG 

CGGGATCC-TTCGGCGCGGGTATG 

CCCGCTCGAG-CGGCGAGCGAGAGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 
CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC | 
CCCGCTCGAG-ATTATGATAGCGGCCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG-TTTAAACCGATAGGTAAACG 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG - TGATGCCGGAAATGGTG 

CGGGATCC-ATGATGCCGGAAATGGTG 

CCCGCTCGAG-TGTCAGCGTGGCGCA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG -CAGCTGATCGACTATTC 
CCCGCTCGAG-GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG- AGACCTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-Ncol 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG- ACGGCTGTACGTTGATGT 
CGGGATCC- AACGGCTGTACGTTGATG 
CCCGCTCGAG-TTTGTCAGAGGAATTCGCG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF 35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 

CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 

CCCGCTCGAG-AAACAGCCATTTGAGCGA 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGOTCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG- TACGCATTT ACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


G AT C AG C T AG C CAT AT G - AAAC AG AAAAAAAC C G C 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG -GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG- AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG- GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 

CGGGATCC-GCCATACCTTCTTATCAGAG 

CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG -CATCCTGCCAGCGAAC 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-TTCGCCTACGGTTTTTTG 


Xhol 


ORF 98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


BamHI-Ndel 
Xhol 


ORF100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF 103 


Forward 
Reverse 


G C G GAT C CC AT AT G - AAC C AC G AC AT C AC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF 105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF 109 


Forward 
Reverse 


GCGGATCCCATATG-GAAGATTTATATATAATACTCG 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAG AAT TC - G CACCGCAAAAGGCAAAAACC G C A 
AAACTGCAG-TCTGCGCGTTTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAAAGTCGAC-CTATTTTTTAGGGGC 7TTTGC TTGTTTGAAAAGCCTGCC 


EcoRI 
Sail 


ORF119 


Forward 
Reverse 


AAAG AAT TC - TACAAC ATGT ATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 
Reverse 


AAAAAAGTCGAC-ATGTCTTACCGCGCAAGCAGTTC TCC 

a n uptpp nr:-TrAGGAArACAAACGATGACGAATATCCGTATC 


Sail 
PstI 


ORF125 


Forward 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


nzv AGAATTC-ATGACTGATAATCGGGGGTTTACG 
AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC 

AAACTGCAG - CTA rTGCAATGCGCCGCC GCGGG AATG ITT GAGCAGGCG 


EcoRI 

PstI | 


ORF129 


Forward 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 


EcoRI 


Reverse 


AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


PstI 


ORF130 


Forward 
Reverse 


a A AGA ATTP— GPAGTAPTTGCCAT TCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF 131 


T— 1 

Forward 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 


BamHI-Ndel 

1 'PI I " * 1 1 1UW1 




Reverse 


CCCGCTCGAG-CCAGCGGACGCGTTC 


Xhol 


ORF 132 


Forward 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 


BamHI-Ndel 

JJOilli.ll HUvl 




Reverse 


CCCGCTCGAG-CCAATCTGCCAGCCGT 


Xhol 


ORr 133 


Forward 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-AAACTTGTAGCTCATCGT 


Xhol 


OKf 134 


Forward 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-ATCCTGTGCCAATGCG 


Xhol 


ORF 135 


Forward 


GrGGATGrCATATG-CCGTCTGAAAAAGCTTT 


BamHI-Ndel 

UCUlll 11 liUbl 




Reverse 


CCCGCTCGAG-AAATACCGCTGAGGATG 


Xhol 


ORF 136 


Forward 


rGrGGATrrGrTAGC-ATGAAGCGGCGTATAGCC 


BamHI-Nhel 

UCUlllll iillvl 




Reverse 


CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


Xhol 


ORr 137 


Forward 


GGGGGATrrGATATG-GGCACGGCGGGAAATA 


BamHI-Ndel 

Uoiiuii nuwi 




Reverse 


CCCGCTCGAG-ATAACGGTATGCCGCC 


Xhol 


ORF 138 


Forward 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-CGGCGTTTTATAGCGG 


Xhol 


ORF 139 


Forward 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 


BamHI-Ndel 




Reverse 


CCCGCTCGAG- TAACGTTTCCGTGCGTTT 


Xhol 
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ORF140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORF141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG-AATGGCTTCCGCAATATG 


BamHI-Ndel 
Xhol 


ORF144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORF147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRl site (eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
ORFs 115 and 127), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cIoning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf 2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fusion 


orf 6 


+ 


+ 




GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fusion 


orf 10 




n.d. 


n.d. 




orf 11 




n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fusion 


orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 




n.d. 


n.d. 




orf 18 




n.d. 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 




+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 




+ 


GST-fusion 


orf 28 






+ 


GST-fusion 


orf 29 




n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 




n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 




n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


j orf 79 


+ 


4- 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 
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orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf89 


+ 


n.d. 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 


+ 


n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




orf 111 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


4- 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1 . A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198,200, 202, 204,206,208,210,212,214,216,218, 220,222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258,-260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

30 5. A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

1 0 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

1 5 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89,91,93,95,97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291,293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
411,413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
61 1, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 
651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 71 1, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 
851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
& 891.. 

1 0. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11,13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 
93,95,97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199,201,203, 205,207,209,211,213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 
295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 
495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 
695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

12. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
10 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 14 or claim 15 for use as a pharmaceutical. 

1 7. The use of a composition according to claim 14 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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